Data Analysis

Monday, August 2, 2021

From the Ladder of Causation to the ladder of Modeling and Simulation – an essay

In The Book of Why(Pearl and Mackenzie, 2018) on causal inference, Pearl proposes the ladder of causation going up in three steps from the seeing to doing to imagining. The discussion in the book seems to have been inspired by machine learning and reading the book, I was wondering how the proposed ladder relates to pharmacometrics. I concluded that the relation is very tight, but that for pharmacometrics, one could rename the three steps as Data, Model and Simulations. Here my thoughts.

The Ladder

The Data

At the first step of the Ladder of Causation we have associations. Observing what things come together or after each other. We are not able to say what happens, if we do something. In the context of modeling and simulations, it is more intuitive to refer to this step as the data. We have the data, and we can look at associations in the data. Here at this level, it does not matter whether we get the data from observing what happens around us, or whether we perform an experiment with an intervention that produces some outcomes.

The Model

At the second step of the Ladder of Causation we have the intervention. We are interested in what happens if we do something, or how we can make something happen. Clearly one way to answer the question is to do something and observe what happens. We perform experiments and collect and analyze the resulting data. However, performing the required experiments may not always be possible and is not necessarily required. Having some ideas of causal relationships, these ideas can be refined by observing what Is happening around us, and refine our ideas based on what we observed. We say that we use a causal model to analyze the data to obtain our answer. Even, if we do not need a model to analyze the data, we need it to come up with a causal question and design the experiment in the first place.

In the context of model and simulation, this step of the ladder corresponds to the model. We propose a model which we think answers our question, we design an experiment based on it consisting possibly of observations only, and then fit the model to the data to obtain estimates of the causal effect. For an ideal randomized experiment, there is limited discussion on the model to use. We need to regress the treatment on the outcome. Moving away from randomized experiments, the question of which model to use becomes more difficult, and results that we will obtain will depend on the model. In any case, without a model, we cannot ask the question nor design the appropriate experiment.

The Simulations

At the third level we consider aspects which we have not and possibly cannot be observed, the counterfactuals. In the context of modeling and simulation, this simply corresponds to the simulations that can be used to quantify aspects that have been observed but also aspects that have not been observed.

References

Pearl, J., Mackenzie, D., 2018. The Book of Why: The new science of cause and effect. Basic Books, New York.

Causal salad or a zoo of causal models – an essay

This is a brief essay noting down ideas how we build up ideas on causal relationship and how we use these ideas to carry out experiments and perform statistical and model based analyses of the results. It largely reflects my understanding of the Book of Why.(Pearl and Mackenzie, 2018)

During our life we all build up knowledge, ideas and believes on how the world around us functions. We obtain that knowledge by observing what happens around us, what happens when we do something, and by listening to others. Aiming at how this knowledge, ideas and believes are going to be used here subsequently, I refer to them as models of reality. We do not have only a single model, but we all have a whole zoo of them, also referred to as a casual salad in (McElreath, 2020). And they are not only our models, but we share many of them across society. As beautifully explained in (Harari, 2015), these shared models are a central aspect shaping societies.

The zoo of models evolves over time. It evolves as each of us grows up and acquires skills and knowledge and as we gain experience in life. It evolves also as societies and humankind accumulate knowledge and as habits and fashions change.

When doing experiments or analyzing data and, in particular, if we have to make decisions, we inevitably match what we find with our zoo of models, i.e., we do some causal inference. Based on the findings, we may be able to improve our models, we may have to question some of them, or perhaps come up with new ones, or protect our zoo by adjusting our perception of reality and our memories. This process may be done with varying rigor at various levels including, in particular, at the level of the planning and execution of experiments, the analysis of the data, and the matching of the data with our models. Some examples follow.

The child: By letting fall things and throwing them, a child will learn that things fall down. Here no formal planning, execution or analysis is involved, and the causal interpretation is taken care of by our intuition. This and subsequent examples are causal question since they answers what happens, if we do something.

High school experiment: Knowing that things fall down, we would like to know how long stones of different sizes take to fall down. We device the experiment to find this out. We measure falling times. We graph them, and we draw our conclusions. Here, the causal question comes first, the art is then to set up the experiment adequately, and an informal analysis of the data suffices to answer our question.

Statistical experiment: This is similar to the previous examples, except that variability becomes important. For example, once we know how long it take for things to fall down, and that this seems to be similar for objects of different weights, we would like to find out whether these times are exactly the same for a stone and a piece of wood. We measure again the times it takes to reach ground. But this time, we need to measure repeatedly to be able to come to conclusions. And we probably want to use statistical hypothesis testing to express how sure we are about our finding whether it takes the same time for both objects, or not. Here, again the causal question comes in the beginning, the art is to set up the experiment adequately. In contrast to the previous example, augmenting the analysis with statistical hypothesis tests helps to express and communicate our findings.

Randomized trial: When studying heterogeneous populations, we need to ensure that the subjects which we are studying are representative of the population. Or in other words, we need to avoid that our selection process affects and confounds the results. One robust possibility is to use randomization and select subjects randomly. Thus again, to be able to come to causal conclusions, we must set up our experiment appropriately with a good understanding of the causal question, and then can make an analysis taking advantage of the setup to get answers.

Observational study: When it is not possible to execute experiments that ensure causal interpretation of the results, we need to use our zoo of models or rather a selected model from our zoo to analyze the data that we obtain.

The doctor: The doctor needs to identify the best treatment for a given patient. This will be based on results from randomized clinical trials but needs to take into account that patient are different and that different alternative treatment options exist. Here, the doctor needs to use his zoo of models to evaluate the different options and come up with a recommendation. This example is to say that even, if it was possible to separate the experiment from the causal model using randomization, results must be integrated back into models to come up with decisions.

The black swan: The example of the “The doctor” may suggest that trying to avoid using causal models for the analysis of data is a lost effort, since in many situations we will have to use the models anyway. The counterexample is our intrinsic urge to maintain our zoo of models intact, independent of what happens. This is nicely described by (Taleb, 2007) and constitute an important argument to execute careful experiments and analyses whenever possible.

Coming back when doing experiments and analyzing the data obtained, we inevitably match what we find with our zoo of models. Usually, it is good practice to separate out the model from the experiment and its analysis, i.e., to use the model zoo only to give the question and to design the experiment, in a way that the analysis can be done without model and the result can be used to feed the zoo. In some cases, questions can only be answered by using some (causal) model as part of the analysis. In any case, when it comes to the point of making decisions, we have to use our zoo of models.

References

Harari, Y.N., 2015. Sapiens: A Brief History of Humankind. Harper Collins Publishers, New York, NY.

McElreath, R., 2020. Statistical Rethinking, Statistical Rethinking. https://doi.org/10.1201/9780429029608

Pearl, J., Mackenzie, D., 2018. The Book of Why.

Taleb, N.N., 2007. The Black Swan: The Impact of the Highly Improbable (Random House, 2007).

Sunday, April 15, 2012

Bridge Sampling, Importance Sampling and Incremental Umbrella Sampling

Related but not often put into relation:

- Weighted histogram analysis method

- Estimation of normalizing constants

- Importance sampling

- Umbrella sampling

- Incremental umbrella sampling

- Bridge sampling

- Path sampling

A Start of a Collection of Links

Towards Bridge Sampling

Empirical distributions in selection bias models

Y Vardi, 1985

The Annals of Statistics 13 (1) 178-203

Estimating normalizing constants and reweighting mixtures in Markov chain Monte Carlo

[PDF] à partir de psu.edu

CJ Geyer, 1994

Technical Report 568, 1994

Simulating ratios of normalizing constants via a simple identity: a theoretical exploration
XL Meng, WH Wong, 1996
Statistica Sinica 6, 831-860

Simulating normalizing constants: From importance sampling to bridge sampling to path sampling
A Gelman, XL Meng, 1998
Statistical Science, 163-185

A theory of statistical models for Monte Carlo integration
A Kong, P McCullagh, XL Meng, D Nicolae, Z Tan, 2003
Journal of the Royal Statistical Society: Series B (Statistical Methodology)

On a likelihood approach for Monte Carlo integration

[PDF] à partir de amstat.org

Z Tan, 2004

Journal of the American Statistical Association 99 (468) 1027-1036

Towards Incremental Umbrella Sampling

The weighted histogram analysis method for free‐energy calculations on biomolecules. I. The method

[PDF] à partir de sabanciuniv.edu

S Kumar, JM Rosenberg, D Bouzida, RH Swendsen, PA Kollman, 1992

Journal of Computation Chemistry, 13 (8) 1011-1021

Multidimensional adaptive umbrella sampling: Applications to main chain and side chain peptide conformations
C Bartels, M Karplus, 1997
Journal of computational chemistry 18 (12), 1450-1462

Analyzing biased Monte Carlo and molecular dynamics simulations
C Bartels, 2000
Chemical Physics Letters 331 (5-6), 446-454

Absolute free energies of binding of peptide analogs to the HIV‐1 protease from molecular dynamics simulations
C Bartels, A Widmer, C Ehrhardt, 2005
Journal of computational chemistry 26 (12), 1294-1305

Some Attempts to Digest

Statistically optimal analysis of samples from multiple equilibrium states

MR Shirts, JD Chodera, 2008

The Journal of chemical physics 129 (124105)

Dynamical reweighting: Improved estimates of dynamical properties from simulations at multiple temperatures
JD Chodera, WC Swope, F Noé, JH Prinz, MR Shirts, VS Pande, 2011
The Journal of chemical physics 134, 244107

About Me

Monday, August 2, 2021

From the Ladder of Causation to the ladder of Modeling and Simulation – an essay

The Ladder

The Data

The Model

The Simulations

References

Causal salad or a zoo of causal models – an essay

References

Sunday, April 15, 2012

Bridge Sampling, Importance Sampling and Incremental Umbrella Sampling

Related but not often put into relation:

A Start of a Collection of Links

Towards Bridge Sampling

Empirical distributions in selection bias models

Estimating normalizing constants and reweighting mixtures in Markov chain Monte Carlo

A theory of statistical models for Monte Carlo integrationA Kong, P McCullagh, XL Meng, D Nicolae, Z Tan, 2003Journal of the Royal Statistical Society: Series B (Statistical Methodology)

A theory of statistical models for Monte Carlo integrationA Kong, P McCullagh, XL Meng, D Nicolae, Z Tan, 2003Journal of the Royal Statistical Society: Series B (Statistical Methodology)

On a likelihood approach for Monte Carlo integration

Towards Incremental Umbrella Sampling

Analyzing biased Monte Carlo and molecular dynamics simulations C Bartels, 2000 Chemical Physics Letters 331 (5-6), 446-454

Absolute free energies of binding of peptide analogs to the HIV‐1 protease from molecular dynamics simulations C Bartels, A Widmer, C Ehrhardt, 2005 Journal of computational chemistry 26 (12), 1294-1305

Some Attempts to Digest

A theory of statistical models for Monte Carlo integration
A Kong, P McCullagh, XL Meng, D Nicolae, Z Tan, 2003
Journal of the Royal Statistical Society: Series B (Statistical Methodology)

A theory of statistical models for Monte Carlo integration
A Kong, P McCullagh, XL Meng, D Nicolae, Z Tan, 2003
Journal of the Royal Statistical Society: Series B (Statistical Methodology)

Analyzing biased Monte Carlo and molecular dynamics simulations
C Bartels, 2000
Chemical Physics Letters 331 (5-6), 446-454

Absolute free energies of binding of peptide analogs to the HIV‐1 protease from molecular dynamics simulations
C Bartels, A Widmer, C Ehrhardt, 2005
Journal of computational chemistry 26 (12), 1294-1305