3  Identification

In empirical research, identification refers to the process of establishing a clear and logical relationship between a cause and an effect. This involves demonstrating that the cause is responsible for the observed effect, and that there are no other factors that could potentially explain the effect. The goal of identification is to provide strong evidence that a particular factor is indeed the cause of a particular outcome, rather than simply coincidentally happen. In order to identify a cause-and-effect relationship, researchers can use experimental or non-experimental, that is, observational data, or both. Section 3.2 will discuss in greater detail how data can be collected that helps to evaluate causes and measure the magnitude of the effects of causes. Section 3.3.2 will explain some difficulties researchers must face when they aim to find empirical evidence on causal effects.

3.1 From anecdote to insight

Anecdotes are great. They are true stories—often intriguing, relatable, and easy to understand. They provide vivid examples that make abstract ideas more concrete and memorable. Whether it’s a personal experience or a captivating story about a successful business leader, anecdotes resonate because they tap into our natural affinity for storytelling. Their simplicity and emotional impact can make them powerful teaching tools.

And importantly, anecdotes are hard to contradict. Take, for example, the argument that smoking can’t be that harmful because your 88-year-old uncle has smoked his entire life and he is still in good health. It’s a tough claim to refute, as it’s a real-life example. However, the problem lies in extrapolating a single, isolated case to draw broader conclusions, which can be misleading.

However, while anecdotes can be persuasive, their strength is also their weakness. They represent isolated instances, and while it’s hard to deny the truth of an individual story, the danger lies in overgeneralizing from it. Anecdotes lack the rigorous analysis and breadth of evidence necessary to draw reliable conclusions. They don’t account for the full complexity of most situations, especially in business, where decisions are influenced by many interconnected factors.

In business, relying too heavily on anecdotes can lead to misguided conclusions. For example, a company might base its strategy on the success story of a famous entrepreneur without considering the countless failed ventures that didn’t make the headlines. This is known as survivorship bias, where the successes are visible, but the failures are hidden.

The challenge, then, is to take anecdotes and go beyond them. Instead of drawing direct conclusions, use them as starting points for deeper investigation. They can provide valuable hypotheses but need to be supported by data, rigorous analysis, and an understanding of the underlying principles at play. Anecdotes can inspire curiosity and point us in interesting directions, but they should be tested against a larger body of evidence to ensure that the insights we draw are reliable and applicable in a broader context.

Exercise 3.1 Survivorship bias

Read “How Successful Leaders Think” by Roger Martin (2007). Here is a summary of Martin (2007) taken from the Harvard Business Review Store:

In search of lessons to apply in our own careers, we often try to emulate what effective leaders do. Roger Martin says this focus is misplaced, because moves that work in one context may make little sense in another. A more productive, though more difficult, approach is to look at how such leaders think. After extensive interviews with more than 50 of them, the author discovered that most are integrative thinkers–that is, they can hold in their heads two opposing ideas at once and then come up with a new idea that contains elements of each but is superior to both. Martin argues that this process of consideration and synthesis (rather than superior strategy or faultless execution) is the hallmark of exceptional businesses and the people who run them. To support his point, he examines how integrative thinkers approach the four stages of decision making to craft superior solutions. First, when determining which features of a problem are salient, they go beyond those that are obviously relevant. Second, they consider multidirectional and nonlinear relationships, not just linear ones. Third, they see the whole problem and how the parts fit together. Fourth, they creatively resolve the tensions between opposing ideas and generate new alternatives. According to the author, integrative thinking is an ability everyone can hone. He points to several examples of business leaders who have done so, such as Bob Young, co-founder and former CEO of Red Hat, the dominant distributor of Linux open-source software. Young recognized from the beginning that he didn’t have to choose between the two prevailing software business models. Inspired by both, he forged an innovative third way, creating a service offering for corporate customers that placed Red Hat on a path to tremendous success.

  1. Discuss the concepts introduced by Martin (2007) critically:
  • Does he provide evidence for his ideas to work?
  • Is there a proof that his suggestions can yield success?
  • Is there some evidence about whether his ideas are superior to alternative causes of action?
  • What can we learn from the article?
  • Does his argumentation fulfill highest academic standards?
  • What is his identification strategy with respect to the causes of effects and the effects of causes?
  • Martin (2007, p. 81) speculates:

“At some point, integrative thinking will no longer be just a tacit skill (cultivated knowingly or not) in the heads of a select few.”

  1. If teachers in business schools would have followed his ideas of integrative thinkers being more successful, almost 20 years later, this should be the dominant way to think as a business leader. Is that the case? And if so, can you still gain some competitive advantage by thinking that way?
Figure 3.1: Distribution of bullet holes in returned aircraft

Source: Martin Grandjean (vector), McGeddon (picture), Cameron Moll (concept), CC BY-SA 4.0, Link

  1. Figure 3.1 visualizes the distribution of bullet holes in aircraft that returned from combat in World War II. Imagine you are an aircraft engineer. What does this picture teach you?

  2. Inform yourself about the concept of survivorship bias explained in Wikipedia (2024).

  3. In Martin (2007), the author provides an example of a successful company to support his management ideas. Discuss whether this article relates to survivorship bias.

Martin, R. (2007). How successful leaders think. Harvard Business Review, 85(6), 71–81. https://hbr.org/2007/06/how-successful-leaders-think

Drawing insights from anecdotes is challenging, especially in business, for several reasons:

  1. Limited sample size: Anecdotes are usually individual cases that do not reflect the full extent of a situation. In business, decisions often require data from large, diverse populations to ensure reliability. Relying on a single story or experience can lead to conclusions that are not universally valid.

  2. Bias and subjectivity: Anecdotes are often influenced by personal perspectives, emotions or particular circumstances. Moreover, anecdotes often highlight success stories while ignoring failures. This is an example for the so-called Survivorship Bias.

  3. Lack of context and the inability to generalize: Anecdotes often lack the broader context necessary to understand the underlying factors of a situation. Business problems tend to be complex and influenced by numerous variables such as market trends, consumer behavior and external economic conditions. Many of these variables change significantly over time. Without this context, an anecdote can oversimplify the problem and lead to incorrect decisions. Anecdotes are usually specific to a particular time, place or set of circumstances. They may not apply to different markets, industries or economic environments, which limits their usefulness for general decision-making. For example, learning only from the tremendous success of figures like Steve Jobs while ignoring the countless people who failed is like learning how to live a long life by talking to a single 90-year-old person. If that person happens to be obese and a heavy smoker, it doesn’t mean those behaviors contributed to their longevity.

  4. Lack of data rigor: Anecdotes lack the rigor and precision of data-driven analysis where the empirical model that allows to identify causality and to measure the effect of causes is formally described.

Thus, to make informed business decisions, it is critical to base insights on systematic data analysis rather than anecdotal evidence, as anecdotes are too narrow, subjective and unreliable to guide complex business strategies.

Exercise 3.2 Systematic analysis as an alternative to anecdotal analysis

  • What defines a systematic analysis?
  • When can we say that we have ‘found evidence’?
  • When can we claim to have identified a causal effect?
  • When can we trust the size of an effect that we have measured?

3.2 Data acquisition

There are several ways to get data which allows you to (hopefully) identify a cause-and-effect relationship:

3.2.1 Interviews

An interview is normally a one-on-one verbal conversation. Interviews are conducted to learn about the participants’ experiences, perceptions, opinions, or motivations. The relationship between the interviewer and interviewee must be taken into account and other circumstances (place, time, face to face, email, etc.) should be taken into account. There are three types of interviews structured, semi-structured, and unstructured. Structured interviews use a set list of questions and hence are like a verbal surveys. In unstructured interviews the interviewer doesn’t use predetermined questions but only a list of topics to address. Semi-structured interviews are the middle ground. Semi-structured interviews require the interviewer to have a list of questions and topics pre-prepared, which can be asked in different ways with different interviewee/s. Semi-structured interviews increase the flexibility and the responsiveness of the interview while keeping the interview on track, increasing the reliability and credibility of the data. Semi-structured interviews are one of the most common interview techniques.

Structured interviews use a predetermined list of questions that must be asked in a specific order, improving the validity and trustworthiness of the data but lowering respondent response. Structured interviews resemble verbal questionnaires. In unstructured interviews, the interviewer has a planned list of subjects to cover but no predetermined interview questions. In exchange for less reliable data, this makes the interview more adaptable. Long-term field observation studies may employ unstructured interviews. The middle ground are interviews that are semi-structured. In semi-structured interviews, the interviewer must prepare a list of questions and themes that can be brought up in various ways with various interviewees.

Interviews allow you to address a cause-and-effect relationship fairly directly, and it can be a good idea to interview experts and ask some why and how questions to gather initial knowledge about a particular topic before further elaborating your research strategy. For example, I interviewed kindergarten teachers with many years of experience working with children, as well as other parents, to get information on how to solve the problem of my children throwing plates around the dining room. However, findings based on interviews are not very valid or reliable because the personal perceptions of both the interviewer and the interviewee can have an impact on the conclusions drawn. For example, I received very different tips and explanations because of the personal experiences of the people I interviewed. Unfortunately, I could not really ask my son why he was misbehaving. His vocabulary was too limited at the time, and even if he could speak, he would probably refuse to tell me the truth.

3.2.2 Surveys

In contrast to an interview a survey can be sent out to many different people. Surveys can be used to identify a cause-and-effect relationship by asking questions about both the cause and the effect and examining the responses. For example, if a researcher wanted to determine whether there is a relationship between a person’s level of education and their income, they could conduct a survey asking participants about their education level and their income. If the data shows that participants with higher levels of education tend to have higher incomes, it suggests that education may be a cause of higher income. However, it is important to note that surveys can only establish a correlation between variables, but it is difficult to claim that correlations that where found through the survey imply a causal relationship. To establish a causal relationship, a researcher would need to use other methods, such as an experiment, to control for other potential factors that might influence the relationship that the respondent does not see.

3.2.3 Case studies

Case studies involve in-depth examination of a single case or a small number of cases in order to understand a particular phenomenon. Case studies can be conducted using both quantitative and qualitative methods, depending on the research question and the data being analyzed. While it is reasonable to find causal effects in the particular case, it is problematic to generalize the causal relationship. To establish a general causal relationship, a researcher would need to use other methods, such as an experiment, to control for other potential factors that might influence the relationship that the respondent does not see.

3.2.4 Experiments

One way to clearly identify a cause-and-effect relationship is through experiments, which involve manipulating the cause (the independent variable) and measuring the effect (the dependent variable) under controlled conditions (we will later on define precisely what is meant here). Experiments can be conducted using both quantitative and qualitative methods. Here are some examples:

  • A medical study in which a new drug is tested on a group of patients, while a control group receives a placebo.
  • An educational study in which a group of students is taught a new method of learning, while a control group is taught using the traditional method.
  • An agricultural study in which a group of crops is treated with a new fertilization method, while a control group is not treated.
  • A study to determine the effect of a new training program on employee productivity might involve randomly assigning employees to either a control group that does not receive the training, or an experimental group that does receive the training. By comparing the productivity of the two groups, the researchers can determine if the new training program had a causal effect on employee productivity.
  • A study to determine the effect of a new advertising campaign on sales might involve randomly assigning different groups of customers to be exposed to different versions of the campaign. By comparing the sales of the different groups, the researchers can determine if the advertising campaign had a causal effect on sales.
  • In experimental economics, experimental methods are used to study economic questions. In a lab-like environment data are collected to investigate the size of certain effects, to test the validity of economic theories, to illuminate market mechanisms, or to examine the decision making of people. Economic experiments usually motivates and rewards subjects with money. The overall goal is to mimic real-world incentives and investigate things that cannot be captured or identified in the field.
  • In behavioral economics, laboratory experiments are also used to study decisions of individuals or institutions and to test economic theory. However, it is done with a focus on cognitive, psychological, emotional, cultural, and social factors.
Figure 3.2: Daniel Kahneman and his best selling book1

1 Source: https://commons.wikimedia.org/wiki/File:Daniel_Kahneman_(3283955327)_(cropped).jpg

Daniel Kahneman

Thinking, Fast and Slow

In 2002 the Nobel Prize of Economics was awarded to Vernon L. Smith, I quote The Sveriges Riksbank Prize in Economic Sciences in Memory of Alfred Nobel (2002), “for having established laboratory experiments as a tool in empirical economic analysis, especially in the study of alternative market mechanisms” and Daniel Kahneman “for having integrated insights from psychological research into economic science, especially concerning human judgment and decision-making under uncertainty”.

The Sveriges Riksbank Prize in Economic Sciences in Memory of Alfred Nobel. (2002). Nobel prize outreach AB 2024. Fri. 15 nov 2024. https://www.nobelprize.org/prizes/economic-sciences/2002/summary/

The strength of evidence from a controlled experiment is generally considered to be strong. However, the external validity, i.e., the generalizability, should be considered as well. External validity is sometimes low because effects that you can identify and measure in a lab are sometimes only of minor importance in the field.

There are different types of experiments:

Randomized controlled trials (RCTs) are a specific type of an experiment that involve randomly assigning participants to different treatment groups and comparing the outcomes of those groups. RCTs are often considered the gold standard of experimental research because they provide a high degree of control over extraneous variables and are less prone to bias.

For a better explanation and some great insights into what an RCT actually is, please watch the video produced by UNICEFInnocenti and published on the YouTube channel of UNICEF’s dedicated research center, see https://youtu.be/Wy7qpJeozec and Figure 3.3.

Figure 3.3: Randomized Controlled Trials (RCTs)2

2 Source: https://youtu.be/Wy7qpJeozec

Quasi-experiments involve the manipulation of an independent variable, but do not involve random assignment of participants to treatment groups. Quasi-experiments are less controlled than RCTs, but can still provide valuable insights into cause-and-effect relationships.

Natural experiments involve the observation of naturally occurring events or situations that provide an opportunity to study cause-and-effect relationships. Natural experiments are often used when it is not possible or ethical to manipulate variables experimentally.

In a laboratory experiment, researchers manipulate an independent variable and measure the effect on a dependent variable in a controlled laboratory setting. This allows for greater control over extraneous variables, but the results may not generalize to real-world situations.

In a field experiment, researchers manipulate an independent variable and measure the effect on a dependent variable in a natural setting, rather than in a laboratory. This allows researchers to study real-world phenomena, but it can be more difficult to control for extraneous variables.

3.2.5 Observational data

Figure 3.4: Observational data3

3 Source: https://pixabay.com/images/id-5029286/

Observational data are data that had been observed before the research question was asked or being collected independently from the study. To understand how observational data can be used to constitute a causal relationship is a bit tricky because there is only one world and only one reality at a time. In other words, we usually miss a counterfactual which we can use for a comparison. Take, for example, the past COVID-19 pandemic, where you chose to be vaccinated or not. Regardless of what you chose, we will never find out what would have happened to you if you had chosen differently. Maybe you would have died, maybe you would have gotten more or less sick, or maybe you wouldn’t have gotten sick at all. We don’t know, and it’s impossible to find out because it’s impossible to observe the counterfactual outcomes. This makes it difficult to establish causality from observational data. However, ingenious minds have found reasonable procedures and methods to extract some level of knowledge from observational data that allows us to infer causal relationships from observational data where we cannot directly observe the counterfactual outcome. We will come back to these methods later on.

In the upcoming sections, however, we will discuss experimental research designs including randomized controlled trials (RCTs) which are considered to be the “gold standard for measuring the effect of an action” (Taddy, 2019, p. 128). RCTs can be used, for example, to study the effectiveness of drugs by observing people randomly assigned to three groups, one taking the pill (or treatment), a second receiving a placebo, and a third taking nothing. If the first group responds in any way differently than the other groups, the drug has an effect. Before explaining an RCT in more detail, we need to be clear about the fundamental problem of causal inference. This will be discussed in the following.

Taddy, M. (2019). Business data science: Combining machine learning and economics to optimize, automate, and accelerate business decisions (1st ed.). McGraw Hill Education.

Exercise 3.3 Methods used in economic research (Solution 3.1)

Read Paldam (2021) which is freely available here and answer the following questions:

  1. List the eight types of research methods described in the paper and provide the description found in the paper.
  2. Read the following statements and discuss whether they are true or not, and if the latter, correct them:
    1. The annual production of research papers in economics in the year 2017 has reached about 100 papers in top journals, and about 1,400 papers in the group of good journals. The production has grown with 3.3% per year, and thus it has doubled the last twenty years.
    2. The upward trend in publication must be due to the large increase in the importance of publications for the careers of researchers, which has greatly increased the production of papers. There has also been a large increase in the number of researches, but as citations are increasingly skewed toward the top journals it has not increased demand for papers correspondingly.
    3. Four trends are significant: The fall in theoretical papers and the rise in classical papers. There is also a rise in the share of statistical method and event studies. It is surprising that there is no trend in the number of experimental studies.
    4. Book reviews have dropped to less than 1/3. Perhaps, it also indicates that economists read fewer books than they used to. Journals have increasingly come to use smaller fonts and larger pages, allowing more words per page. The journals from North-Holland Elsevier have managed to cram almost two old pages into one new one. This makes it easier to publish papers, while they become harder to read.
    5. About 50% of papers in the sample considered in belong to the economic theory class, about 6% are experimental studies, and about 43% are empirical studies based on data inference.
    6. The papers in economic theory have increased from 33.6% to 59.5% – this is the largest change for any of the eight subgroups. It is highly significant in the trend test.
  3. Explain what is meant with and discuss the reasons that lead to that fatigue.
  4. According to Paldam (2021): What factors contribute to the immediate relevance of research papers for policymakers?
Paldam, M. (2021). Methods used in economic research: An empirical study of trends and levels. Economics, 15(1), 28–42.

3.3 Causal inference

Figure 3.5: Causal Inference: The Mixtape4

4 Source: Cunningham (2021)

Cunningham, S. (2021). Causal inference: The mixtape. Accessed January 30, 2023; Yale University Press. https://mixtape.scunning.com/

As Cunningham (2021) explains in his book (see Figure 3.5), establishing causality is very challenging. Causal inference can assist to some extent. It is the process of establishing causal relationships between variables, aiming to determine whether a change in one variable (the cause or independent variable) leads to a change in another variable (the effect or dependent variable). This process goes beyond mere association or correlation and seeks to establish that one event or factor directly influences another. Various methods of causal inference exist, and this section along with the upcoming chapters will discuss these methods. All methods share a common goal: identifying and measuring a relationship without any bias.

3.3.1 The fundamental problem of causal inference

Unfortunately, claiming a causal relationship to be empirically true is often not straightforward. The main reason for this lies in the so-called fundamental problem of causal inference, which is the issue of observing only one of the potential outcomes for each unit in a study. This means we lack the counterfactual outcome, which is the hypothetical outcome that would have occurred if a subject or unit had experienced a different condition or treatment than what actually happened. Thus, the fundamental problem of causal inference is actually a missing data problem.

For example, consider my son, who enjoyed throwing plates from the table. He must decide between throwing a plate or not, but he cannot do both simultaneously – an ability only possible in fictional movies like “Everything Everywhere All at Once”. Of course, my son can conduct an experiment by throwing a plate now and later deciding not to throw a plate. After observing both actions, he may claim to have found evidence that throwing a plate causes noise. However, he can never be 100% certain that the noise he heard after throwing the plate was solely caused by his action. It could be a coincidence that something else caused the noise at precisely the same time, like one of his siblings throwing a fork. He merely assumes it was due to his action. To be more certain, he might repeat the experiment hundreds of times. Even then, he can never be 100% sure. It is still not proof in a logical sense because an external factor could theoretically cause the noise. However, this is where statistics come into play: nowing the environment and the setup of his actions, it becomes extremely unlikely that the noise was not caused by his action. Knowing the setup means, we now that hasn’t been any external factor that may have caused a false a causal fallacy. As Scott Cunningham emphasizes, “prior knowledge is required in order to justify any claim about a causal finding”:

Cunningham (2021, ch. 1.3): “It is my firm belief, which I will emphasize over and over in this book, that without prior knowledge, estimated causal effects are rarely, if ever, believable. Prior knowledge is required in order to justify any claim of a causal finding. And economic theory also highlights why causal inference is necessarily a thorny task.”

To illustrate that the fundamental problem of causal inference is actually a missing data problem, let’s consider the fictitious example data presented in Table 3.1. For different individuals, dentoted as \(i\), we know whether they were received treatment \((T=1)\) or did not receive treatment \((T=0)\), as well as whether the outcome was positive \((Y=1)\) or negative \((Y=0)\). Since we do not observe the counterfactual outcomes, we are unable to determine the individual treatment effect (ITE), which is expressed as \(Y_i(1)-Y_i(0)\).

Table 3.1: Example data to illustrate that the fundamental problem of causal inference
\(i\) \(T\) \(Y\) \(Y_i(1)\) \(Y_i(0)\) \(Y_i(1)-Y_i(0)\)
1 0 0 ? 0 ?
2 1 1 1 ? ?
3 1 0 0 ? ?
4 0 0 ? 0 ?
5 0 1 ? 1 ?

Exercise 3.4 Causal inference ch.1 (Solution 3.2)

Please read chapter 1 (Introduction) of Cunningham (2021) and answer the following questions. The book is freely available here and here you find chapter 1.

  1. What are some common misconceptions about causality that the author addresses in chapter 1?
  2. What is the role of randomization in causal inference, as described in the book?

3.3.2 Correlation does not imply causation

Correlation refers to a statistical relationship between two variables, where one variable tends to increase or decrease as the other variable also increases or decreases. However, just because two variables are correlated does not necessarily mean that one variable causes the other. This is known as the correlation does not imply causation principle.

For example, across many areas the number of storks is correlated with the birth rate of babies (see Matthews, 2000). However, this does not mean that the presence of storks causes an increase in the birth rate. It is possible that both the number of storks and the number of babies born are influenced by other factors, such as the overall population density or economic conditions in the area.

Matthews, R. (2000). Storks deliver babies (p= 0.008). Teaching Statistics, 22(2), 36–38.

Therefore, it is important to carefully consider all possible explanations (confounders) for a correlation and to use data to disentangle the true cause-and-effect relationship between variables.

Figure 3.6: Correlation does not imply causation5

5 Source: https://youtu.be/DFPm_a-_uJM

Tip 3.1

Watch the video of Brady Neal’s lecture Correlation Does Not Imply Causation and Why. Alternatively, you can read chapter 1.3 of his lecture notes (Neal, 2020) which you find here.

3.3.3 Simpsons Paradox

Figure 3.7: Discrimination6

6 Source: The photography is public domain and stems from the Library of Congress Prints and Photographs Division Washington, see: http://hdl.loc.gov/loc.pnp/pp.print.

Discrimination is bad. Whenever we see it, we should try to find ways to overcome it. De jure segregation mandated the separation of races by law is clearly discriminatory. Other forms of discrimination, however, are often more difficult to spot and as long we don’t have good evidence for discrimination, we should not judge prematurely. That means, we should be sure that we see an act of making unjustified distinctions between individuals based on some categories to which they belong or perceived to belong. For example, if men and women are treated differently without an acceptable reason, we consider it discriminative.

However, as the following example discussed in Bickel et al. (1975) will show, it is often challenging to identify discrimination. In 1973, UC Berkeley was accused of discrimination because it admitted only 35% of female applicants but 44% of male applicants overall. The difference was statistical significant and based on that many people protested claiming justice and equality. However, it turned out that the selection of students was not discriminative against women but against men. Accordingly to Bickel et al. (1975) the different overall admission rates can be largely explained by a “tendency of women to apply to graduate departments that are more difficult for applicants of either sex to enter” (Bickel et al., 1975, p. 403). Figure Figure 3.8 taken from Bickel et al. (1975, p. 403) visualizes this fact. Looking on the decisions within the departments seperately, there is even a “statistically significant bias in favor of women” (Bickel et al., 1975, p. 403).

Figure 3.8: Proportion of applicants that are women plotted against proportion of applicants admitted7

7 Source: Bickel et al. (1975, p. 403)

Bickel, P. J., Hammel, E. A., & O’Connell, J. W. (1975). Sex bias in graduate admissions: Data from Berkeley: Measuring bias is harder than is usually assumed, and the evidence is sometimes contrary to expectation. Science, 187(4175), 398–404. https://doi.org/10.1126/science.187.4175.398

Here is a summary of Bickel et al. (1975, p. 403):

“Examination of aggregate data on graduate admissions to the University of California, Berkeley, for fall 1973 shows a clear but misleading pattern of bias against female applicants. Examination of the disaggregated data reveals few decision-making units that show statistically significant departures from expected frequencies of female admissions, and about as many units appear to favor women as to favor men. If the data are properly pooled, taking into account the autonomy of departmental decision making, thus correcting for the tendency of women to apply to graduate departments that are more difficult for applicants of either sex to enter, there is a small but statistically significant bias in favor of women. The graduate departments that are easier to enter tend to be those that require more mathematics in the undergraduate preparatory curriculum. The bias in the aggregated data stems not from any pattern of discrimination on the part of admissions committees, which seem quite fair on the whole, but apparently from prior screening at earlier levels of the educational system. Women are shunted by their socialization and education toward fields of graduate study that are generally more crowded, less productive of completed degrees, and less well funded, and that frequently offer poorer professional employment prospects.”

Exercise 3.5 Graduate admissions (Solution 3.3)

Read the first three pages of Bickel et al. (1975), i.e., pages 398-400, and answer the following questions. The article can be found here.

  1. Describe the two assumptions that must be true in order to prove that UC Berkeley discriminates against women or men overall.
  2. Table 1, shows that 277 fewer women and 277 more men were admitted than we would have expected under the two assumptions. Show how this number was calculated.
  3. Explain the analogy with fish that illustrates the danger of pooling data.

Exercise 3.6 Simpson’s Paradox (Solution 3.4)

  1. What is Simpson’s Paradox?
    1. A phenomenon in which the direction of a relationship between two variables changes when a third variable is introduced
    2. A phenomenon in which the strength of a relationship between two variables changes when a third variable is introduced
    3. The phenomenon where correlation appears to be present in different groups of data, but disappears or reverses when the groups are combined
  2. What is a potential cause of Simpson’s Paradox?
    1. Differences in the variance of the two variables
    2. Differences in the correlation of the two variables
    3. Confounding variables
    4. Differences in the sample size of the two variables

3.3.4 Rubin causal model

If we are interested in the causal effect of a certain treatment on an outcome, we need to compare the outcome, \(Y\), of an individuals, \(i\), who received the treatment, \(1\), to the outcome, \(Y\), of the same individual, \(i\), who did not receive the treatment, \(0\):

\[ ITE_i=Y_i(1)-Y_i(0). \]

Unfortunately, as discussed in Section 3.3.1, this individual treatment effect (ITE) does not exist as person \(i\) can either be treated or not, but not both simultaneously. Since the counterfactual outcome is missing for each individual, we cannot observe the actual causal effect.

The Rubin model, also known as the potential outcomes framework, provides a theoretical framework for identifying causality in the context of missing data-existing.

In the model, each subject, denoted with \(i\) (for example, a person, a school), has two potential outcomes: one outcome if the subject receives the treatment (treatment condition denoted with \(T=1\)) and another outcome if the does not receive the treatment (control condition denoted with \(T=0\)). In short, the model specifies that you can use the difference of the average of a group that received the treatment and the average of the group that did not received the treatment and use it as a substitute for the ITE: \[ \mathbb{E} [\underbrace{Y_i(1)-Y_i(0)}_{ITE}] = \underbrace{\mathbb{E}[Y(1)] - \mathbb{E}[Y(0)]}_{ATE}. \tag{3.1}\]

However, the ATE is only equal to the expected ITE if certain assumptions are fulfilled. The upcoming sections will discuss these assumptions.

3.3.5 Its difficult to overcome the fundamental problem

Keele (2015, p. 314): “An identification analysis identifies the assumptions needed for statistical estimates to be given a causal interpretation.”

Keele, L. (2015). The statistics of causal inference: A view from political methodology. Political Analysis, 23(3), 313–335.

In the following we will discuss conditions that need to hold in order to empirically draw causal conclusions from the ATE without bias. This is important because equation Equation 3.1 does not necessarily hold when using observational data without a more elaborated identification strategy.

3.3.5.1 Example

Suppose we want to measure the effect of a vaccine on survival rates. We observed the residents of a small city with 2,000 inhabitants over the course of 30 days. On day 1, we arrived in town and injected the vaccine to 200 individuals. By day 30, we counted the deceased in both groups: four died in the vaccinated group, while eighteen died in the group of 1,800 unvaccinated individuals. With a survival rate of 98% in the vaccinated group and 99% in the unvaccinated group, it may appear that the vaccine lowers the survival rate. Imagine that study is real, would you claim that the vaccine kills because according to Equation 3.1 we could use the ATE to indicate the ITE?

The answer is yes, but only if the assumptions of ignorability (Section 3.3.5.2) and unconfoundedness (Section 3.3.5.3) are satisfied.

In brief, ignorability means that the 200 treated individuals are not systematically different from the other 1,800 individuals regarding characteristics that have an impact of the chances to survive. Considering the fact that we cannot randomly select 200 individuals from the 2,000 inhabitants due to legal constraints (as everyone has the right to choose whether or not to receive the vaccine), we must consider who is willing to get vaccinated. This selection bias may pose issues, as vulnerable populations often have a higher willingness to accept the vaccine compared to younger and healthier individuals who may fear the disease less. For example, if we vaccinated individuals with preexisting conditions that make them more vulnerable, such as the elderly or those with chronic illnesses, we cannot assume that the ATE is equal to the ITE. This is because the overall mortality risk is higher among those who received the vaccine.

Unconfoundedness means that there are no other factors that could explain both the likelihood of receiving the vaccine and the likelihood of death. For example, if vaccinated individuals were not required to stay at home during these 30 days, their likelihood of dying may increase due to greater exposure to risky situations and other people, which in turn raises their chances of contracting a disease.

Tip 3.2
Figure 3.9: Average treatment effect (ATE)

Watch the video of Brady Neal’s lecture What Does Imply Causation? Randomized Control Trials (see Figure 3.9). Alternatively, you can read Neal (2020, ch. 2) of his lecture notes, see here.

3.3.5.2 Ignorability

Referring to table Table 3.1, Brady Neal (2020) wrote:

“what makes it valid to calculate the ATE by taking the average of the Y(0) column, ignoring the question marks, and subtracting that from the average of the Y(1) column, ignoring the question marks?” This ignoring of the question marks (missing data) is known as ignorability. Assuming ignorability is like ignoring how people ended up selecting the treatment they selected and just assuming they were randomly assigned their treatment” (Neal, 2020, p. 9)

Ignorability means that the way individuals are assigned to treatment and control groups is irrelevant for the data analysis. Thus, when we aim to explain a certain outcome, we can ignore how an individual made it into the treated or control group. It has also been called unconfoundedness or no omitted variable bias. We will come back to these two terms in Section 3.4 and in Chapter 5.

Randomized controlled trials (RCTs) are characterized by randomly assigning individuals to different treatment groups and comparing the outcomes of those groups. Thus, RCTs are essentially build on the assumption of ignorability which can be written formally like \[ (Y(1), Y(0)) \perp T. \]

This notation indicates that the potential outcomes of an individual, \(Y\), are independent of whether they have actually received the treatment. The symbol “\(\perp\)” denotes independence, suggesting that the outcomes \(Y(1)\) and \(Y(0)\) are orthogonal to the treatment \(T\).

The assumption of ignorability allows to write the ATE as follows: \[\begin{align} \mathbb{E}[Y(1)]-\mathbb{E}[Y(0)] & =\mathbb{E}[Y(1) \mid T=1]-\mathbb{E}[Y(0) \mid T=0] \\ & =\mathbb{E}[Y \mid T=1]-\mathbb{E}[Y \mid T=0]. \end{align}\]

Another perspective on this assumption is the concept of exchangeability. Exchangeability refers to the idea that the treatment groups can be interchanged such that if they were switched, the new treatment group would have the same outcomes as the old treatment group, and the new control group would have the same outcomes as the old control group.

3.3.5.3 Unconfoundedness

While randomized controlled trials (RCTs) assume the concept of ignoreability, most observational data present challenges in drawing causal conclusions due to the presence of confounding factors that affect both (1) the likelihood of individuals being part of the treatment group and (2) the observed outcome. For example, regional factors can affect both the number of storks and the number of babies born in a region. These factors are typically referred to as confounders, which we discussed in Section 3.3.2 as having the potential to create the illusion of a causal impact where none exists. However, empirical methods are available to control for these confounders and prevent the violation of the ignoreability assumption. Formally, the assumption can be written as \[ (Y(1), Y(0)) \perp T \mid X. \] This allows to write the ATE as follows: \[\begin{align} \mathbb{E}[Y(1)\mid X]-\mathbb{E}[Y(0)\mid X] & =\mathbb{E}[Y(1) \mid T=1, X]-\mathbb{E}[Y(0) \mid T=0, X] \\ & =\mathbb{E}[Y \mid T=1, X]-\mathbb{E}[Y \mid T=0, X]. \end{align}\]

This means that we need to control for all factors (X) that influence both groups. We will revisit this topic in Section 3.4, where we will discuss the various functional impacts that must be considered to avoid causal bias.

Exercise 3.7 Treatment effects (Solution 3.5)

Read sections 2.1 and 2.3 of Neal (2020).

  1. What is the individual treatment effect (ITE)?
  2. What is the average treatment effect (ATE)?
  3. How is the ATE calculated?
  4. Can the ATE be used to determine the effect of a treatment on an individual level?
  5. What are some potential sources of bias when estimating the ATE?
Neal, B. (2020). Introduction to causal inference from a machine learning perspective: Course lecture notes. Accessed January 30, 2023. https://www.bradyneal.com/Introduction_to_Causal_Inference-Dec17_2020-Neal.pdf

3.4 Statistical control requires causal justification

Tip 3.3

Read Wysocki et al. (2022) which is freely available here. Here you find a good summary of the paper.

Wysocki, A. C., Lawson, K. M., & Rhemtulla, M. (2022). Statistical control requires causal justification. Advances in Methods and Practices in Psychological Science, 5(2). https://doi.org/10.1177/25152459221095823

Scientific research revolves around challenging our own views and findings. A good researcher does not merely present their results; instead, they engage in discussions about potential limitations and pitfalls to draw valid conclusions. Engaging in polemics goes against the essence of good research. We should not conceal potential weaknesses in our scientific strategy or empirical approach; rather, we should emphasize their existence. Even if this disappoints individuals seeking easy answers, it is crucial to acknowledge these limitations. The Catalogue of Bias is an excellent resource that provides insight into various potential pitfalls and challenges encountered during research, which may sometimes be difficult to completely rule out.

Solutions to the exercises

Solution 3.1. Methods used in economic research (Exercise 3.3)

  1. List the eight types of research methods described in the paper and provide the description found in the paper {.unlisted .unnumbered}
  1. Economic theory: Papers are where the main content is the development of a theoretical model. The ideal theory paper presents a (simple) new model that recasts the way we look at something important.

  2. Statistical technique, incl. forecasting Papers reporting new estimators and tests are published in a handful of specialized journals in econometrics and mathematical statistics. Some papers compare estimators on actual data sets. If the demonstration of a methodological improvement is the main feature of the paper, it belongs to this subgroup, but if the economic interpretation is the main point of the paper, it belongs to the classical empirical studies or newer techniques group.

  3. Surveys, incl. meta-studies When the literature in a certain field becomes substantial, it normally presents a motley picture with an amazing variation, especially when different schools exist in the field. They are of two types, where the second type is still rare:

    1. Assessed surveys where the author reads the papers and assesses what the most reliable results are. Such assessments require judgment that is often quite difficult to distinguish from priors, even for the author of the survey.
    2. Meta-studies which are quantitative surveys of estimates of parameters claimed to be the same. These types of studies have two levels: The basic level collects and codes the estimates and studies their distribution. This is a rather objective exercise where results seem to replicate rather well. The second level analyzes the variation between the results. This is less objective.
  4. Experiments in laboratories Most of these experiments take place in a laboratory, where the subjects communicate with a computer, giving a controlled, but artificial, environment. A number of subjects are told a (more or less abstract) story and paid to react in either of a number of possible ways. A great deal of ingenuity has gone into the construction of such experiments and in the methods used to analyze the results. Lab experiments do allow studies of behavior that are hard to analyze in any other way, and they frequently show sides of human behavior that are difficult to rationalize by economic theory. However, everything is artificial – even the payment while participants usually receive real money for participation and their performance. In some cases, the stories told are so elaborate and abstract that framing must be a substantial risk. In addition, experiments cost money, which limits the number of subjects. It is also worth pointing to the difference between expressive and real behavior. It is typically much cheaper for the subject to `express’ nice behavior in a lab than to be nice in the real world.

  5. Event studies (field experiments and natural experiments) Event studies are studies of real world experiments. They are of two types:

    1. Field experiments analyze cases where some people get a certain treatment and others do not. The `gold standard’ for such experiments is double blind random sampling, where everything (but the result!) is announced in advance. Experiments with humans require permission from the relevant authorities, and the experiment takes time too. In the process, things may happen that compromise the strict rules of the standard. Controlled experiments are expensive, as they require a team of researchers.
    2. Natural experiments take advantage of a discontinuity in the environment, i.e., the period before and after an (unpredicted) change of a law, an earth-quake, etc. Methods have been developed to find the effect of the discontinuity. Often, such studies look like classical empirical studies with many controls that may that may or may not belong. Thus, the problems discussed under the classic empirical studies also apply here.
  6. Descriptive, deductions from data In a descriptive study, researcher use an existing sample and hence, they have no control over the data generating process as it is usually the case with experiments. Descriptive studies are deductive. The researcher describes the data aiming at finding structures that tell a story, which can be interpreted. The findings may call for a formal test. If one clean test follows from the description, the paper can still be classified as a descriptive study. If more elaborate regression analysis is used, however, it can also be classified as a classical empirical study. Descriptive studies often contain a great deal of theory. Some descriptive studies present a new data set developed by the author to analyze a debated issue. In these cases, it is often possible to make a clean test, so to the extent that biases sneak in, they are hidden in the details of the assessments made when the data are compiled.

  7. Classical empirical studies Typically have three steps: It starts by a theory, which is developed into an operational model. Then it presents the data set, and finally it runs regressions. The significance levels of the t-ratios on the coefficient estimated assume that the regression is the first meeting of the estimation model and the data. In practice, we all know that this is rarely the case. The classical method is often just a presentation technique. The great virtue of the method is that it can be applied to real problems outside academia. The relevance comes with a price: The method is quite flexible as many choices have to be made, and they often give different results. Preferences and interests, may affect these choices.

  8. Newer techniques Partly as a reaction to the problems of classical empirical methods, the last 3–4 decades have seen a whole set of newer empirical techniques. They include different types of vector autoregression (VAR)8, Bayesian techniques, causality and co-integration tests, Kalman filters, hazard functions, etc. The main reason for the lack of success for the new empirics is that it is quite bulky to report a careful set of co-integration tests or VARs, for example, and they often show results that are far from useful in the sense that they are unclear and difficult to interpret.

  1. Read the following statements and discuss whether they are true or not, and if the latter, correct them: {.unlisted .unnumbered}

Statements i) and vi) are false, all others are correct.

  1. The numbers are wrong: The annual production of research papers in economics in the year 2017 has now reached about 1,000 papers in top journals, and about 14,000 papers in the group of good journals. The production has grown with 3.3% per year, and thus it has doubled the last twenty years.

  2. Statement is correct: The upward trend in publication must be due to the large increase in the importance of publications for the careers of researchers, which has greatly increased the production of papers. There has also been a large increase in the number of researches, but as citations are increasingly skewed toward the top journals it has not increased demand for papers correspondingly.

  3. Statement is correct: Four trends are significant: The fall in theoretical papers and the rise in classical papers. There is also a rise in the share of statistical method and event studies. It is surprising that there is no trend in the number of experimental studies.

  4. Statement is correct: Book reviews have dropped to less than 1/3. Perhaps, it also indicates that economists read fewer books than they used to. Journals have increasingly come to use smaller fonts and larger pages, allowing more words per page. The journals from North-Holland Elsevier have managed to cram almost two old pages into one new one. This makes it easier to publish papers, while they become harder to read.

  5. Statement is correct: About 50% of papers in the sample considered in belong to the economic theory class, about 6% are experimental studies, and about 43% are empirical studies based on data inference.

  6. Economic theory is not on the rise: The papers in economic theory have dropped from 59.5% to 33.6% – this is the largest change for any of the eight subgroups. It is highly significant in the trend test.

  7. “Theory fatigue” is a term used to describe the decreasing attractiveness of theoretical research among journals, researchers and political decision-makers. This trend goes hand in hand with the increasing importance of empirical research. Policy makers are finding it increasingly difficult to engage with variations of existing theoretical models, and researchers often struggle to systematically summarize the findings of theoretical work, making it difficult to draw definitive conclusions on specific topics. In addition, theoretical work can be unconvincing to a wider audience that must rely on the reasonableness of complex and sometimes unrealistic assumptions. The credibility of theoretical research often depends on how realistic the initial assumptions are and how plausible the conclusions are. If neither aspect is grounded in reality, there is a danger that the research becomes an abstract exercise that provides new insights into the real world, but which are difficult to communicate to the layperson.

  8. A research paper that policymakers find appealing typically offers estimates of a crucial effect that decision-makers outside of academia are keen to understand. Papers that target policymakers should put an emphasis on distilling the core findings into a short executive summary tailored for decision-makers, facilitating their understanding and application of the research insights.

8  A VAR is a statistical model used to capture the relationship between multiple quantities as they change over time.

Solution 3.2. Causal inference ch.1 (Exercise 3.4)

  1. Some common misconceptions about causality that the author addresses in chapter 1 include confusion between correlation and causality, and the belief that observational studies cannot (hardly) establish causality without prior knowledge. He says that human beings “engaging in optimal behavior are the main reason correlations almost never reveal causal relationships, because rarely are human beings acting randomly” which is crucial for identifying causal effects.

  2. The role of randomization in causal inference, as described in the book, is that it helps to control for confounding variables and allows for the estimation of causal effects.

Solution 3.3. Graduate admissions (Exercise 3.5)

  1. Assumption 1 is that in any given discipline male and female applicants do not differ in respect of their intelligence, skill, qualifications, promise, or other attribute deemed legitimately pertinent to their acceptance as students. It is precisely this assumption that makes the study of “sex bias” meaningful, for if we did not hold it any differences in acceptance of applicants by sex could be attributed to differences in their qualifications, promise as scholars, and so on. (…) Assumption 2 is that the sex ratios of applicants to the various fields of graduate study are not importantly associated with any other factors in admission. (Bickel et al., 1975, p. 398)
  2. Expectations were taken based on the overall acceptance rate of about 0.416 and multiplied by the total observed numbers of applicants admitted and rejected. For example: \((3738+4704) \cdot 0.41666 \approx 3460\) and \((3738+4704) \cdot (1-0.41666) \approx 4981\). Taking the difference of these two measures gives the number to be explained.
  3. The analogy is explained on page 400:

“Picture a fishnet with two different mesh sizes. A school of fish, all of identical size (assumption 1), swim toward the net and seek to pass. The female fish all try to get through the small mesh, while the male fish all try to get through the large mesh. On the other side of the net all the fish are male. Assumption 2 said that the sex of the fish had no relation to the size of the mesh they tried to get through. It is false.”

The UC Berkley case is just one of many examples to illustrate that uniformity of group assignment of individuals is a necessary condition to ensure that pooling of data does not lead to misleading conclusions when using statistics. The phenomenon of obtaining different results depending on whether one considers the data pooled or unpooled is often referred to as the Simpson Paradox.

Solution 3.4. Simpson’s Paradox (Exercise 3.6)

  1. a), 2. c) and d)

Solution 3.5. Solution to exercise Exercise 3.7

  1. The individual treatment effect (ITE) is a measure of the effect of a treatment or intervention on an individual level. It represents the difference in the outcome for an individual who receives the treatment versus the outcome for that same individual if they had not received the treatment.
  2. The average treatment effect (ATE) is a measure of the difference in the expected outcomes between a treatment group and a control group. It represents the overall effect of a treatment on the population as a whole.
  3. The ATE is calculated by taking the difference between the average outcome for the treatment group and the average outcome for the control group.
  4. No, the ATE is a population-level measure and cannot be used to determine the effect of a treatment on an individual level. To determine the effect of a treatment on an individual level, you would need to use techniques such as propensity score matching or instrumental variables.
  5. Some potential sources of bias when estimating the ATE include selection bias, measurement bias, and unobserved confounding variables. To mitigate these biases, researchers may use randomization or other advanced statistical techniques such as propensity score matching or instrumental variables to control for these potential sources of bias.