3  Identification

Required reading

Cunningham (2021, ch. 1)

In empirical research, identification refers to the process of establishing a clear and logical relationship between a cause and an effect. This involves demonstrating that the cause is responsible for the observed effect, and that there are no other factors that could potentially explain the effect. The goal of identification is to provide strong evidence that a particular factor is indeed the cause of a particular outcome, rather than simply coincidentally happen. In order to identify a cause-and-effect relationship, researchers can use experimental or non-experimental, that is, observational data, or both. Section 6.2 will explain some difficulties researchers must face when they aim to find empirical evidence on causal effects.

3.1 Causal inference

Figure 3.1: Causal Inference: The Mixtape1

1 Source: Cunningham (2021)

Cunningham, S. (2021). Causal inference: The mixtape. Accessed January 30, 2023; Yale University Press. https://mixtape.scunning.com/

As Cunningham (2021) explains in his book (see Figure 3.1), establishing causality is very challenging. Causal inference can assist to some extent. It is the process of establishing causal relationships between variables, aiming to determine whether a change in one variable (the cause or independent variable) leads to a change in another variable (the effect or dependent variable). This process goes beyond mere association or correlation and seeks to establish that one event or factor directly influences another. Various methods of causal inference exist, and this section along with the upcoming chapters will discuss these methods. All methods share a common goal: identifying and measuring a relationship without any bias.

3.2 The fundamental problem of causal inference

Unfortunately, claiming a causal relationship to be empirically true is often not straightforward. The main reason for this lies in the so-called fundamental problem of causal inference, which is the issue of observing only one of the potential outcomes for each unit in a study. This means we lack the counterfactual outcome, which is the hypothetical outcome that would have occurred if a subject or unit had experienced a different condition or treatment than what actually happened. Thus, the fundamental problem of causal inference is actually a missing data problem.

For example, consider my son, who enjoyed throwing plates from the table. He must decide between throwing a plate or not, but he cannot do both simultaneously – an ability only possible in fictional movies like “Everything Everywhere All at Once”. Of course, my son can conduct an experiment by throwing a plate now and later deciding not to throw a plate. After observing both actions, he may claim to have found evidence that throwing a plate causes noise. However, he can never be 100% certain that the noise he heard after throwing the plate was solely caused by his action. It could be a coincidence that something else caused the noise at precisely the same time, like one of his siblings throwing a fork. He merely assumes it was due to his action. To be more certain, he might repeat the experiment hundreds of times. Even then, he can never be 100% sure. It is still not proof in a logical sense because an external factor could theoretically cause the noise. However, this is where statistics come into play: nowing the environment and the setup of his actions, it becomes extremely unlikely that the noise was not caused by his action. Knowing the setup means, we now that hasn’t been any external factor that may have caused a false a causal fallacy. As Scott Cunningham emphasizes, “prior knowledge is required in order to justify any claim about a causal finding”:

Cunningham (2021, ch. 1.3): “It is my firm belief, which I will emphasize over and over in this book, that without prior knowledge, estimated causal effects are rarely, if ever, believable. Prior knowledge is required in order to justify any claim of a causal finding. And economic theory also highlights why causal inference is necessarily a thorny task.”

To illustrate that the fundamental problem of causal inference is actually a missing data problem, let’s consider the fictitious example data presented in Table 3.1. For different individuals, dentoted as \(i\), we know whether they were received treatment \((T=1)\) or did not receive treatment \((T=0)\), as well as whether the outcome was positive \((Y=1)\) or negative \((Y=0)\). Since we do not observe the counterfactual outcomes, we are unable to determine the individual treatment effect (ITE), which is expressed as \(Y_i(1)-Y_i(0)\).

Table 3.1: Example data to illustrate that the fundamental problem of causal inference
\(i\) \(T\) \(Y\) \(Y_i(1)\) \(Y_i(0)\) \(Y_i(1)-Y_i(0)\)
1 0 0 ? 0 ?
2 1 1 1 ? ?
3 1 0 0 ? ?
4 0 0 ? 0 ?
5 0 1 ? 1 ?

Exercise 3.1 Causal inference ch.1

Please read chapter 1 (Introduction) of Cunningham (2021) and answer the following questions. The book is freely available here and here you find chapter 1.

  1. What are some common misconceptions about causality that the author addresses in chapter 1?
  2. What is the role of randomization in causal inference, as described in the book?
  1. Some common misconceptions about causality that the author addresses in chapter 1 include confusion between correlation and causality, and the belief that observational studies cannot (hardly) establish causality without prior knowledge. He says that human beings “engaging in optimal behavior are the main reason correlations almost never reveal causal relationships, because rarely are human beings acting randomly” which is crucial for identifying causal effects.

  2. The role of randomization in causal inference, as described in the book, is that it helps to control for confounding variables and allows for the estimation of causal effects.

3.3 Rubin causal model

If we are interested in the causal effect of a certain treatment on an outcome, we need to compare the outcome, \(Y\), of an individuals, \(i\), who received the treatment, \(1\), to the outcome, \(Y\), of the same individual, \(i\), who did not receive the treatment, \(0\):

\[ ITE_i=Y_i(1)-Y_i(0). \]

Unfortunately, as discussed in Section 3.2, this individual treatment effect (ITE) does not exist as person \(i\) can either be treated or not, but not both simultaneously. Since the counterfactual outcome is missing for each individual, we cannot observe the actual causal effect.

The Rubin model, also known as the potential outcomes framework, provides a theoretical framework for identifying causality in the context of missing data-existing.

In the model, each subject, denoted with \(i\) (for example, a person, a school), has two potential outcomes: one outcome if the subject receives the treatment (treatment condition denoted with \(T=1\)) and another outcome if the does not receive the treatment (control condition denoted with \(T=0\)). In short, the model specifies that you can use the difference of the average of a group that received the treatment and the average of the group that did not received the treatment and use it as a substitute for the ITE: \[ \mathbb{E} [\underbrace{Y_i(1)-Y_i(0)}_{ITE}] = \underbrace{\mathbb{E}[Y(1)] - \mathbb{E}[Y(0)]}_{ATE}. \tag{3.1}\]

However, the ATE is only equal to the expected ITE if certain assumptions are fulfilled. The upcoming sections will discuss these assumptions.

3.4 Its difficult to overcome the fundamental problem

Keele (2015, p. 314): “An identification analysis identifies the assumptions needed for statistical estimates to be given a causal interpretation.”

Keele, L. (2015). The statistics of causal inference: A view from political methodology. Political Analysis, 23(3), 313–335.

In the following we will discuss conditions that need to hold in order to empirically draw causal conclusions from the ATE without bias. This is important because equation Equation 3.1 does not necessarily hold when using observational data without a more elaborated identification strategy.

3.4.1 Example

Suppose we want to measure the effect of a vaccine on survival rates. We observed the residents of a small city with 2,000 inhabitants over the course of 30 days. On day 1, we arrived in town and injected the vaccine to 200 individuals. By day 30, we counted the deceased in both groups: four died in the vaccinated group, while eighteen died in the group of 1,800 unvaccinated individuals. With a survival rate of 98% in the vaccinated group and 99% in the unvaccinated group, it may appear that the vaccine lowers the survival rate. Imagine that study is real, would you claim that the vaccine kills because according to Equation 3.1 we could use the ATE to indicate the ITE?

The answer is yes, but only if the assumptions of ignorability (Section 3.4.2) and unconfoundedness (Section 3.4.3) are satisfied.

In brief, ignorability means that the 200 treated individuals are not systematically different from the other 1,800 individuals regarding characteristics that have an impact of the chances to survive. Considering the fact that we cannot randomly select 200 individuals from the 2,000 inhabitants due to legal constraints (as everyone has the right to choose whether or not to receive the vaccine), we must consider who is willing to get vaccinated. This selection bias may pose issues, as vulnerable populations often have a higher willingness to accept the vaccine compared to younger and healthier individuals who may fear the disease less. For example, if we vaccinated individuals with preexisting conditions that make them more vulnerable, such as the elderly or those with chronic illnesses, we cannot assume that the ATE is equal to the ITE. This is because the overall mortality risk is higher among those who received the vaccine.

Unconfoundedness means that there are no other factors that could explain both the likelihood of receiving the vaccine and the likelihood of death. For example, if vaccinated individuals were not required to stay at home during these 30 days, their likelihood of dying may increase due to greater exposure to risky situations and other people, which in turn raises their chances of contracting a disease.

Tip 3.1
Figure 3.2: Average treatment effect (ATE)

Watch the video of Brady Neal’s lecture What Does Imply Causation? Randomized Control Trials (see Figure 3.2). Alternatively, you can read Neal (2020, ch. 2) of his lecture notes, see here.

3.4.2 Ignorability

Referring to table Table 3.1, Brady Neal (2020) wrote:

“what makes it valid to calculate the ATE by taking the average of the Y(0) column, ignoring the question marks, and subtracting that from the average of the Y(1) column, ignoring the question marks?” This ignoring of the question marks (missing data) is known as ignorability. Assuming ignorability is like ignoring how people ended up selecting the treatment they selected and just assuming they were randomly assigned their treatment” (Neal, 2020, p. 9)

Ignorability means that the way individuals are assigned to treatment and control groups is irrelevant for the data analysis. Thus, when we aim to explain a certain outcome, we can ignore how an individual made it into the treated or control group. It has also been called unconfoundedness or no omitted variable bias. We will come back to these two terms in Section 7.4 and in Chapter 7.

Randomized controlled trials (RCTs) are characterized by randomly assigning individuals to different treatment groups and comparing the outcomes of those groups. Thus, RCTs are essentially build on the assumption of ignorability which can be written formally like \[ (Y(1), Y(0)) \perp T. \]

This notation indicates that the potential outcomes of an individual, \(Y\), are independent of whether they have actually received the treatment. The symbol “\(\perp\)” denotes independence, suggesting that the outcomes \(Y(1)\) and \(Y(0)\) are orthogonal to the treatment \(T\).

The assumption of ignorability allows to write the ATE as follows: \[\begin{align} \mathbb{E}[Y(1)]-\mathbb{E}[Y(0)] & =\mathbb{E}[Y(1) \mid T=1]-\mathbb{E}[Y(0) \mid T=0] \\ & =\mathbb{E}[Y \mid T=1]-\mathbb{E}[Y \mid T=0]. \end{align}\]

Another perspective on this assumption is the concept of exchangeability. Exchangeability refers to the idea that the treatment groups can be interchanged such that if they were switched, the new treatment group would have the same outcomes as the old treatment group, and the new control group would have the same outcomes as the old control group.

3.4.3 Unconfoundedness

While randomized controlled trials (RCTs) assume the concept of ignoreability, most observational data present challenges in drawing causal conclusions due to the presence of confounding factors that affect both (1) the likelihood of individuals being part of the treatment group and (2) the observed outcome. For example, regional factors can affect both the number of storks and the number of babies born in a region. These factors are typically referred to as confounders, which we discussed in Section 6.2 as having the potential to create the illusion of a causal impact where none exists. However, empirical methods are available to control for these confounders and prevent the violation of the ignoreability assumption. Formally, the assumption can be written as \[ (Y(1), Y(0)) \perp T \mid X. \] This allows to write the ATE as follows: \[\begin{align} \mathbb{E}[Y(1)\mid X]-\mathbb{E}[Y(0)\mid X] & =\mathbb{E}[Y(1) \mid T=1, X]-\mathbb{E}[Y(0) \mid T=0, X] \\ & =\mathbb{E}[Y \mid T=1, X]-\mathbb{E}[Y \mid T=0, X]. \end{align}\]

This means that we need to control for all factors (X) that influence both groups. We will revisit this topic in Section 7.4, where we will discuss the various functional impacts that must be considered to avoid causal bias.

Exercise 3.2 Treatment effects

Read sections 2.1 and 2.3 of Neal (2020).

  1. What is the individual treatment effect (ITE)?
  2. What is the average treatment effect (ATE)?
  3. How is the ATE calculated?
  4. Can the ATE be used to determine the effect of a treatment on an individual level?
  5. What are some potential sources of bias when estimating the ATE?
  1. The individual treatment effect (ITE) is a measure of the effect of a treatment or intervention on an individual level. It represents the difference in the outcome for an individual who receives the treatment versus the outcome for that same individual if they had not received the treatment.
  2. The average treatment effect (ATE) is a measure of the difference in the expected outcomes between a treatment group and a control group. It represents the overall effect of a treatment on the population as a whole.
  3. The ATE is calculated by taking the difference between the average outcome for the treatment group and the average outcome for the control group.
  4. No, the ATE is a population-level measure and cannot be used to determine the effect of a treatment on an individual level. To determine the effect of a treatment on an individual level, you would need to use techniques such as propensity score matching or instrumental variables.
  5. Some potential sources of bias when estimating the ATE include selection bias, measurement bias, and unobserved confounding variables. To mitigate these biases, researchers may use randomization or other advanced statistical techniques such as propensity score matching or instrumental variables to control for these potential sources of bias.
Neal, B. (2020). Introduction to causal inference from a machine learning perspective: Course lecture notes. Accessed January 30, 2023. https://www.bradyneal.com/Introduction_to_Causal_Inference-Dec17_2020-Neal.pdf