3  Identification

In empirical research, identification refers to the process of establishing a clear and logical relationship between a cause and an effect. This involves demonstrating that the cause is responsible for the observed effect, and that there are no other factors that could potentially explain the effect. The goal of identification is to provide strong evidence that a particular factor is indeed the cause of a particular outcome, rather than simply coincidentally happen. In order to identify a cause-and-effect relationship, researchers can use experimental or non-experimental, that is, observational data, or both. Section 3.1 will discuss in greater detail how data can be collected that helps to evaluate causes and measure the magnitude of the effects of causes. Section 3.2.2 will explain some difficulties researchers must face when they aim to find empirical evidence on causal effects.

3.1 Data acquisition

There are several ways to get data which allows you to (hopefully) identify a cause-and-effect relationship:

3.1.1 Interviews

An interview is normally a one-on-one verbal conversation. Interviews are conducted to learn about the participants’ experiences, perceptions, opinions, or motivations. The relationship between the interviewer and interviewee must be taken into account and other circumstances (place, time, face to face, email, etc.) should be taken into account. There are three types of interviews structured, semi-structured, and unstructured. Structured interviews use a set list of questions and hence are like a verbal surveys. In unstructured interviews the interviewer doesn’t use predetermined questions but only a list of topics to address. Semi-structured interviews are the middle ground. Semi-structured interviews require the interviewer to have a list of questions and topics pre-prepared, which can be asked in different ways with different interviewee/s. Semi-structured interviews increase the flexibility and the responsiveness of the interview while keeping the interview on track, increasing the reliability and credibility of the data. Semi-structured interviews are one of the most common interview techniques.

Structured interviews use a predetermined list of questions that must be asked in a specific order, improving the validity and trustworthiness of the data but lowering respondent response. Structured interviews resemble verbal questionnaires. In unstructured interviews, the interviewer has a planned list of subjects to cover but no predetermined interview questions. In exchange for less reliable data, this makes the interview more adaptable. Long-term field observation studies may employ unstructured interviews. The middle ground are interviews that are semi-structured. In semi-structured interviews, the interviewer must prepare a list of questions and themes that can be brought up in various ways with various interviewees.

Interviews allow you to address a cause-and-effect relationship fairly directly, and it can be a good idea to interview experts and ask some why and how questions to gather initial knowledge about a particular topic before further elaborating your research strategy. For example, I interviewed kindergarten teachers with many years of experience working with children, as well as other parents, to get information on how to solve the problem of my children throwing plates around the dining room. However, findings based on interviews are not very valid or reliable because the personal perceptions of both the interviewer and the interviewee can have an impact on the conclusions drawn. For example, I received very different tips and explanations because of the personal experiences of the people I interviewed. Unfortunately, I could not really ask my son why he was misbehaving. His vocabulary was too limited at the time, and even if he could speak, he would probably refuse to tell me the truth.

3.1.2 Surveys

In contrast to an interview a survey can be sent out to many different people. Surveys can be used to identify a cause-and-effect relationship by asking questions about both the cause and the effect and examining the responses. For example, if a researcher wanted to determine whether there is a relationship between a person’s level of education and their income, they could conduct a survey asking participants about their education level and their income. If the data shows that participants with higher levels of education tend to have higher incomes, it suggests that education may be a cause of higher income. However, it is important to note that surveys can only establish a correlation between variables, but it is difficult to claim that correlations that where found through the survey imply a causal relationship. To establish a causal relationship, a researcher would need to use other methods, such as an experiment, to control for other potential factors that might influence the relationship that the respondent does not see.

3.1.3 Case studies

Case studies involve in-depth examination of a single case or a small number of cases in order to understand a particular phenomenon. Case studies can be conducted using both quantitative and qualitative methods, depending on the research question and the data being analyzed. While it is reasonable to find causal effects in the particular case, it is problematic to generalize the causal relationship. To establish a general causal relationship, a researcher would need to use other methods, such as an experiment, to control for other potential factors that might influence the relationship that the respondent does not see.

3.1.4 Experiments

One way to clearly identify a cause-and-effect relationship is through experiments, which involve manipulating the cause (the independent variable) and measuring the effect (the dependent variable) under controlled conditions (we will later on define precisely what is meant here). Experiments can be conducted using both quantitative and qualitative methods. Here are some examples:

  • A medical study in which a new drug is tested on a group of patients, while a control group receives a placebo.
  • An educational study in which a group of students is taught a new method of learning, while a control group is taught using the traditional method.
  • An agricultural study in which a group of crops is treated with a new fertilization method, while a control group is not treated.
  • A study to determine the effect of a new training program on employee productivity might involve randomly assigning employees to either a control group that does not receive the training, or an experimental group that does receive the training. By comparing the productivity of the two groups, the researchers can determine if the new training program had a causal effect on employee productivity.
  • A study to determine the effect of a new advertising campaign on sales might involve randomly assigning different groups of customers to be exposed to different versions of the campaign. By comparing the sales of the different groups, the researchers can determine if the advertising campaign had a causal effect on sales.
  • In experimental economics, experimental methods are used to study economic questions. In a lab-like environment data are collected to investigate the size of certain effects, to test the validity of economic theories, to illuminate market mechanisms, or to examine the decision making of people. Economic experiments usually motivates and rewards subjects with money. The overall goal is to mimic real-world incentives and investigate things that cannot be captured or identified in the field.
  • In behavioral economics, laboratory experiments are also used to study decisions of individuals or institutions and to test economic theory. However, it is done with a focus on cognitive, psychological, emotional, cultural, and social factors.
Figure 3.1: Daniel Kahneman and his best selling book1
Daniel Kahneman

Thinking, Fast and Slow

In 2002 the Nobel Prize of Economics was awarded to Vernon L. Smith, I quote The Royal Swedish Academy of Sciences (2002), “for having established laboratory experiments as a tool in empirical economic analysis, especially in the study of alternative market mechanisms” and Daniel Kahneman “for having integrated insights from psychological research into economic science, especially concerning human judgment and decision-making under uncertainty”.

The strength of evidence from a controlled experiment is generally considered to be strong. However, the external validity, i.e., the generalizability, should be considered as well. External validity is sometimes low because effects that you can identify and measure in a lab are sometimes only of minor importance in the field.

There are different types of experiments:

Randomized controlled trials (RCTs) are a specific type of an experiment that involve randomly assigning participants to different treatment groups and comparing the outcomes of those groups. RCTs are often considered the gold standard of experimental research because they provide a high degree of control over extraneous variables and are less prone to bias.

For a better explanation and some great insights into what an RCT actually is, please watch the video produced by UNICEFInnocenti and published on the YouTube channel of UNICEF’s dedicated research center, see https://youtu.be/Wy7qpJeozec and Figure 3.2.

Figure 3.2: Randomized Controlled Trials (RCTs)2

Quasi-experiments involve the manipulation of an independent variable, but do not involve random assignment of participants to treatment groups. Quasi-experiments are less controlled than RCTs, but can still provide valuable insights into cause-and-effect relationships.

Natural experiments involve the observation of naturally occurring events or situations that provide an opportunity to study cause-and-effect relationships. Natural experiments are often used when it is not possible or ethical to manipulate variables experimentally.

In a laboratory experiment, researchers manipulate an independent variable and measure the effect on a dependent variable in a controlled laboratory setting. This allows for greater control over extraneous variables, but the results may not generalize to real-world situations.

In a field experiment, researchers manipulate an independent variable and measure the effect on a dependent variable in a natural setting, rather than in a laboratory. This allows researchers to study real-world phenomena, but it can be more difficult to control for extraneous variables.

3.1.5 Observational data

Figure 3.3: Observational data3

Observational data are data that had been observed before the research question was asked or being collected independently from the study. To understand how observational data can be used to constitute a causal relationship is a bit tricky because there is only one world and only one reality at a time. In other words, we usually miss a counterfactual which we can use for a comparison. Take, for example, the past COVID-19 pandemic, where you chose to be vaccinated or not. Regardless of what you chose, we will never find out what would have happened to you if you had chosen differently. Maybe you would have died, maybe you would have gotten more or less sick, or maybe you wouldn’t have gotten sick at all. We don’t know, and it’s impossible to find out because it’s impossible to observe the counterfactual outcomes. This makes it difficult to establish causality from observational data. However, ingenious minds have found reasonable procedures and methods to extract some level of knowledge from observational data that allows us to infer causal relationships from observational data where we cannot directly observe the counterfactual outcome. We will come back to these methods later on.

In the upcoming sections, however, we will discuss experimental research designs including randomized controlled trials (RCTs) which are considered to be the “gold standard for measuring the effect of an action” (Taddy, 2019, p. 128). RCTs can be used, for example, to study the effectiveness of drugs by observing people randomly assigned to three groups, one taking the pill (or treatment), a second receiving a placebo, and a third taking nothing. If the first group responds in any way differently than the other groups, the drug has an effect. Before explaining an RCT in more detail, we need to be clear about the fundamental problem of causal inference. This will be discussed in the following.

Exercise 3.1 Methods used in economic research (Solution 3.1)

Read Paldam (2021) which is freely available here and answer the following questions:

  1. List the eight types of research methods described in the paper and provide the description found in the paper.
  2. Read the following statements and discuss whether they are true or not, and if the latter, correct them:
    1. The annual production of research papers in economics in the year 2017 has reached about 100 papers in top journals, and about 1,400 papers in the group of good journals. The production has grown with 3.3% per year, and thus it has doubled the last twenty years.
    2. The upward trend in publication must be due to the large increase in the importance of publications for the careers of researchers, which has greatly increased the production of papers. There has also been a large increase in the number of researches, but as citations are increasingly skewed toward the top journals it has not increased demand for papers correspondingly.
    3. Four trends are significant: The fall in theoretical papers and the rise in classical papers. There is also a rise in the share of statistical method and event studies. It is surprising that there is no trend in the number of experimental studies.
    4. Book reviews have dropped to less than 1/3. Perhaps, it also indicates that economists read fewer books than they used to. Journals have increasingly come to use smaller fonts and larger pages, allowing more words per page. The journals from North-Holland Elsevier have managed to cram almost two old pages into one new one. This makes it easier to publish papers, while they become harder to read.
    5. About 50% of papers in the sample considered in belong to the economic theory class, about 6% are experimental studies, and about 43% are empirical studies based on data inference.
    6. The papers in economic theory have increased from 33.6% to 59.5% – this is the largest change for any of the eight subgroups. It is highly significant in the trend test.
  3. Explain what is meant with and discuss the reasons that lead to that fatigue.
  4. According to Paldam (2021): What factors contribute to the immediate relevance of research papers for policymakers?

3.2 Causal inference

3.2.1 The fundamental problem of causal inference

Figure 3.4: Causal Inference: The Mixtape4

Cunningham (2021, ch. 1.3): “It is my firm belief, which I will emphasize over and over in this book, that without prior knowledge, estimated causal effects are rarely, if ever, believable. Prior knowledge is required in order to justify any claim of a causal finding. And economic theory also highlights why causal inference is necessarily a thorny task.”

As Cunningham (2021) explains in his book (see Figure 3.4), it is very hard to claim causality. In the following section, I will paraphrase briefly two aspects why it is so difficult to claim to have found a causal effect. One reason for that is, that it is rather difficult to find or generate the right data and to use them properly so that the result is not biased. First, I will discuss Simpson’s Paradox as an example how easy it is to interpret the data falsely. It will provide an idea on how difficult it is to analyze observational data meaningful and that we need to have a theory when looking on data. Above that, we should try to challenge the assumptions on which the theory is build on. After that I will briefly discuss the fundamental problem of causal inference as a problem of missing counterfactual data.

Exercise 3.2 Causal inference ch.1 (Solution 3.2)

Please read chapter 1 (Introduction) of Cunningham (2021) and answer the following questions. The book is freely available here and here you find chapter 1.

  1. What are some common misconceptions about causality that the author addresses in chapter 1?
  2. What is the role of randomization in causal inference, as described in the book?

3.2.2 Correlation does not imply causation

Correlation refers to a statistical relationship between two variables, where one variable tends to increase or decrease as the other variable also increases or decreases. However, just because two variables are correlated does not necessarily mean that one variable causes the other. This is known as the correlation does not imply causation principle.

For example, it may be observed that the number of storks in a particular area is correlated with the birth rate of babies in that area. However, this does not mean that the presence of storks causes an increase in the birth rate. It is possible that both the number of storks and the number of babies born are influenced by other factors, such as the overall population density or economic conditions in the area.

Therefore, it is important to carefully consider all possible explanations (confounders) for a correlation and to use empirical evidence to determine the true cause-and-effect relationship between variables.

Figure 3.5: Correlation does not imply causation5
Tip 3.1

Watch the video of Brady Neal’s lecture Correlation Does Not Imply Causation and Why. Alternatively, you can read chapter 1.3 of his lecture notes (Neal, 2020) which you find here.

3.2.3 Simpsons Paradox

Figure 3.6: Discrimination6

Discrimination is bad. Whenever we see it, we should try to find ways to overcome it. De jure segregation mandated the separation of races by law is clearly discriminatory. Other forms of discrimination, however, are often more difficult to spot and as long we don’t have good evidence for discrimination, we should not judge prematurely. That means we should be sure that we see an act of making unjustified distinctions between individuals based on some categories to which they belong or perceived to belong. For example, if men and women are treated differently without an acceptable reason, we consider it discriminative. For example, UC Berkeley was accused of discrimination in 1973 because it admitted only 35% of female applicants but 44% of male applicants overall. The difference was statistical significant. However, it turned out that the selection of students was not discriminative against women but agains men accordingly to Bickel et al. (1975). Who conclude there was just a “tendency of women to apply to graduate departments that are more difficult for applicants of either sex to enter” (Bickel et al., 1975, p. 403). Figure Figure 3.7 taken from Bickel et al. (1975, p. 403) visualizes this fact.

Figure 3.7: Proportion of applicants that are women plotted against proportion of applicants admitted7

Here you can read the summary of their remarkable study:

“Examination of aggregate data on graduate admissions to the University of California, Berkeley, for fall 1973 shows a clear but misleading pattern of bias against female applicants. Examination of the disaggregated data reveals few decision-making units that show statistically significant departures from expected frequencies of female admissions, and about as many units appear to favor women as to favor men. If the data are properly pooled, taking into account the autonomy of departmental decision making, thus correcting for the tendency of women to apply to graduate departments that are more difficult for applicants of either sex to enter, there is a small but statistically significant bias in favor of women. The graduate departments that are easier to enter tend to be those that require more mathematics in the undergraduate preparatory curriculum. The bias in the aggregated data stems not from any pattern of discrimination on the part of admissions committees, which seem quite fair on the whole, but apparently from prior screening at earlier levels of the educational system. Women are shunted by their socialization and education toward fields of graduate study that are generally more crowded, less productive of completed degrees, and less well funded, and that frequently offer poorer professional employment prospects.”

Exercise 3.3 Graduate admissions (Solution 3.3)

Read the first three pages of Bickel et al. (1975), i.e., pages 398-400, and answer the following questions. The article can be found here.

  1. Describe the two assumptions that must be true in order to prove that UC Berkeley discriminates against women or men overall.
  2. Table 1, shows that 277 fewer women and 277 more men were admitted than we would have expected under the two assumptions. Show how this number was calculated.
  3. Explain the analogy with fish that illustrates the danger of pooling data.

Exercise 3.4 Simpson’s Paradox (Solution 3.4)

  1. What is Simpson’s Paradox?
    1. A phenomenon in which the direction of a relationship between two variables changes when a third variable is introduced
    2. A phenomenon in which the strength of a relationship between two variables changes when a third variable is introduced
    3. The phenomenon where correlation appears to be present in different groups of data, but disappears or reverses when the groups are combined
  2. What is a potential cause of Simpson’s Paradox?
    1. Differences in the variance of the two variables
    2. Differences in the correlation of the two variables
    3. Confounding variables
    4. Differences in the sample size of the two variables

3.2.4 Rubin causal model

Keele (2015, p. 314): “An identification analysis identifies the assumptions needed for statistical estimates to be given a causal interpretation.”

If we are interested in the causal effect of a certain treatment on an outcome, we need to compare the outcome of the individuals who received the treatment to the outcome of the individuals who did not receive the treatment. However, if the counterfactual outcome is missing for some individuals, we cannot make this comparison and therefore cannot estimate the causal effect. Unfortunately, the counterfactual is usually non-existing. For example, if we want to measure the effect of a vaccine we never can have a person who is vaccinated and not vaccinated at the same time. Formally, we have either \(Y_i(1)\) or \(Y_i(1)\), where \(Y_i\) denotes the effect/output of individual \(i\) in case of being vaccinated (1) and not vaccinated (0).

Thus, the so-called individual treatment effect (ITE) does not exist for person \(i\): \[ ITE_i=Y_i(1)-Y_i(0) \]

The Rubin Causal Model, also known as the potential outcomes framework, is a statistical framework for analyzing causality in the context of missing data. Table 3.1 is taken from Neal (2020) and shows some example data to illustrate that the fundamental problem of causal inference is actually a missing data problem. The Model goes back to Donald B. Rubin (born 1943) a statistician and is now a widely used method for causal inference. The basic premise of the Rubin Causal Model is that for each individual in a study, there are two potential outcomes: the outcome that would occur if the individual were exposed to a certain treatment or intervention (the “treatment group”), and the outcome that would occur if the individual were not exposed to that treatment (the “control group”). The key idea is that these potential outcomes can be used to infer causality by comparing the outcomes between the treatment and control groups even if we do not have a full set of data.

Table 3.1: Example data to illustrate that the fundamental problem of causal inference
i T Y Y(1) Y(0) Y(1)-Y(0)
1 0 0 ? 0 ?
2 1 1 1 ? ?
3 1 0 0 ? ?
4 0 0 ? 0 ?
5 0 1 ? 1 ?
6 1 1 1 ? ?
Tip 3.2
Figure 3.8: Average treatment effect (ATE)

Watch the video of Brady Neal’s lecture What Does Imply Causation? Randomized Control Trials (see Figure 3.8). Alternatively, you can read Neal (2020, ch. 2) of his lecture notes, see here.

Under certain assumptions, the Rubin Causal Model allows for the estimation of the Average Treatment Effect (ATE), which is the difference in the expected outcomes between the treatment and control groups, given by the formula: \[ ATE\triangleq \mathbb{E}[Y(1)-Y(0)] \]

Several methods exist for estimating the ATE within the Rubin Causal Model, and this course will explore some of them. When applied correctly, this model can yield valuable insights into causal relationships and enhance decision-making processes. However, it’s important to recognize that the Rubin Causal Model is subject to certain limitations and assumptions. These assumptions must be satisfied to ensure the validity of the model’s inferences. Section Section 3.2.5 addresses some of these critical assumptions.

To get the average treatment effect (ATE) we can take the average of the individual treatment effects (ITE):

\[ ATE\triangleq \mathbb{E}[Y(1)-Y(0)] = \mathbb{E} [\underbrace{Y_i(1)-Y_i(0)}_{ITE}] \tag{3.1}\]

3.2.5 Its difficult to overcome the fundamental problem

In the following we will discuss conditions that need to hold in order to empirically draw causal conclusions from the ATE without bias. This is important because equation Equation 3.1 does very often not hold when using observational data.

3.2.5.1 Ignorability

Referring to table Table 3.1, Brady Neal (2020) wrote:

“what makes it valid to calculate the ATE by taking the average of the Y(0) column, ignoring the question marks, and subtracting that from the average of the Y(1) column, ignoring the question marks?” This ignoring of the question marks (missing data) is known as ignorability. Assuming ignorability is like ignoring how people ended up selecting the treatment they selected and just assuming they were randomly assigned their treatment” (Neal, 2020, p. 9)

Ignorability means that the way individuals are assigned to treatment and control groups is irrelevant for the data analysis. Thus, when we aim to explain a certain outcome, we can ignore how an individual made it into the treated or control group. It has also been called unconfoundedness or no omitted variable bias. We will come back to these two terms in Section 3.3 and in ?sec-regression.

Randomized controlled trials (RCTs) are characterized by randomly assigning individuals to different treatment groups and comparing the outcomes of those groups. Thus, they are essentially build on the assumption of ignorability which can be written formally like \[ (Y(1), Y(0)) \perp T. \] In words, this means the potential outcome of an individual, \(Y\), do not depend on whether they have really been treated or not. The symbol \(\perp\) is called the perpendicular symbol and simply says that the outcomes \(Y(1)\) and \(Y(0)\) are orthogonal to the treatment.

The assumption of ignorability allows to write the ATE as follows: \[\begin{align} \mathbb{E}[Y(1)]-\mathbb{E}[Y(0)] & =\mathbb{E}[Y(1) \mid T=1]-\mathbb{E}[Y(0) \mid T=0] \\ & =\mathbb{E}[Y \mid T=1]-\mathbb{E}[Y \mid T=0]. \end{align}\]

Another perspective on this assumption is the concept of exchangeability. Exchangeability refers to the idea that the treatment groups can be interchanged such that if they were switched, the new treatment group would have the same outcomes as the old treatment group, and the new control group would have the same outcomes as the old control group.

3.2.5.2 Unconfoundedness

While randomized controlled trials (RCTs) assume the concept of ignoreability, most observational data present challenges in drawing causal conclusions due to the presence of confounding factors that affect both (1) the likelihood of individuals being part of the treatment group and (2) the observed outcome. For instance, regional factors can affect both the number of storks and the number of babies born in a region. These factors are typically referred to as confounders, which we discussed in Section 3.2.2 as having the potential to create the illusion of a causal impact where none exists. However, empirical methods are available to control for these confounders and prevent the violation of the ignoreability assumption.

Exercise 3.5 Treatment effects (Solution 3.5)

Read sections 2.1 and 2.3 of Neal (2020).

  1. What is the individual treatment effect (ITE)?
  2. What is the average treatment effect (ATE)?
  3. How is the ATE calculated?
  4. Can the ATE be used to determine the effect of a treatment on an individual level?
  5. What are some potential sources of bias when estimating the ATE?

3.3 Statistical control requires causal justification

Tip 3.3

Read Wysocki et al. (2022) which is freely available here. Here you find a good summary of the paper.

Scientific research revolves around challenging our own views and findings. A good researcher does not merely present their results; instead, they engage in discussions about potential limitations and pitfalls to draw valid conclusions. Engaging in polemics goes against the essence of good research. We should not conceal potential weaknesses in our scientific strategy or empirical approach; rather, we should emphasize their existence. Even if this disappoints individuals seeking easy answers, it is crucial to acknowledge these limitations. The Catalogue of Bias is an excellent resource that provides insight into various potential pitfalls and challenges encountered during research, which may sometimes be difficult to completely rule out.

Solutions to the exercises

Solution 3.1. Methods used in economic research (Exercise 3.1)

  1. List the eight types of research methods described in the paper and provide the description found in the paper {.unlisted .unnumbered}
  1. Economic theory: Papers are where the main content is the development of a theoretical model. The ideal theory paper presents a (simple) new model that recasts the way we look at something important.

  2. Statistical technique, incl. forecasting Papers reporting new estimators and tests are published in a handful of specialized journals in econometrics and mathematical statistics. Some papers compare estimators on actual data sets. If the demonstration of a methodological improvement is the main feature of the paper, it belongs to this subgroup, but if the economic interpretation is the main point of the paper, it belongs to the classical empirical studies or newer techniques group.

  3. Surveys, incl. meta-studies When the literature in a certain field becomes substantial, it normally presents a motley picture with an amazing variation, especially when different schools exist in the field. They are of two types, where the second type is still rare:

    1. Assessed surveys where the author reads the papers and assesses what the most reliable results are. Such assessments require judgment that is often quite difficult to distinguish from priors, even for the author of the survey.
    2. Meta-studies which are quantitative surveys of estimates of parameters claimed to be the same. These types of studies have two levels: The basic level collects and codes the estimates and studies their distribution. This is a rather objective exercise where results seem to replicate rather well. The second level analyzes the variation between the results. This is less objective.
  4. Experiments in laboratories Most of these experiments take place in a laboratory, where the subjects communicate with a computer, giving a controlled, but artificial, environment. A number of subjects are told a (more or less abstract) story and paid to react in either of a number of possible ways. A great deal of ingenuity has gone into the construction of such experiments and in the methods used to analyze the results. Lab experiments do allow studies of behavior that are hard to analyze in any other way, and they frequently show sides of human behavior that are difficult to rationalize by economic theory. However, everything is artificial – even the payment while participants usually receive real money for participation and their performance. In some cases, the stories told are so elaborate and abstract that framing must be a substantial risk. In addition, experiments cost money, which limits the number of subjects. It is also worth pointing to the difference between expressive and real behavior. It is typically much cheaper for the subject to `express’ nice behavior in a lab than to be nice in the real world.

  5. Event studies (field experiments and natural experiments) Event studies are studies of real world experiments. They are of two types:

    1. Field experiments analyze cases where some people get a certain treatment and others do not. The `gold standard’ for such experiments is double blind random sampling, where everything (but the result!) is announced in advance. Experiments with humans require permission from the relevant authorities, and the experiment takes time too. In the process, things may happen that compromise the strict rules of the standard. Controlled experiments are expensive, as they require a team of researchers.
    2. Natural experiments take advantage of a discontinuity in the environment, i.e., the period before and after an (unpredicted) change of a law, an earth-quake, etc. Methods have been developed to find the effect of the discontinuity. Often, such studies look like classical empirical studies with many controls that may that may or may not belong. Thus, the problems discussed under the classic empirical studies also apply here.
  6. Descriptive, deductions from data In a descriptive study, researcher use an existing sample and hence, they have no control over the data generating process as it is usually the case with experiments. Descriptive studies are deductive. The researcher describes the data aiming at finding structures that tell a story, which can be interpreted. The findings may call for a formal test. If one clean test follows from the description, the paper can still be classified as a descriptive study. If more elaborate regression analysis is used, however, it can also be classified as a classical empirical study. Descriptive studies often contain a great deal of theory. Some descriptive studies present a new data set developed by the author to analyze a debated issue. In these cases, it is often possible to make a clean test, so to the extent that biases sneak in, they are hidden in the details of the assessments made when the data are compiled.

  7. Classical empirical studies Typically have three steps: It starts by a theory, which is developed into an operational model. Then it presents the data set, and finally it runs regressions. The significance levels of the t-ratios on the coefficient estimated assume that the regression is the first meeting of the estimation model and the data. In practice, we all know that this is rarely the case. The classical method is often just a presentation technique. The great virtue of the method is that it can be applied to real problems outside academia. The relevance comes with a price: The method is quite flexible as many choices have to be made, and they often give different results. Preferences and interests, may affect these choices.

  8. Newer techniques Partly as a reaction to the problems of classical empirical methods, the last 3–4 decades have seen a whole set of newer empirical techniques. They include different types of vector autoregression (VAR)8, Bayesian techniques, causality and co-integration tests, Kalman filters, hazard functions, etc. The main reason for the lack of success for the new empirics is that it is quite bulky to report a careful set of co-integration tests or VARs, for example, and they often show results that are far from useful in the sense that they are unclear and difficult to interpret.

  1. Read the following statements and discuss whether they are true or not, and if the latter, correct them: {.unlisted .unnumbered}

Statements i) and vi) are false, all others are correct.

  1. The numbers are wrong: The annual production of research papers in economics in the year 2017 has now reached about 1,000 papers in top journals, and about 14,000 papers in the group of good journals. The production has grown with 3.3% per year, and thus it has doubled the last twenty years.

  2. Statement is correct: The upward trend in publication must be due to the large increase in the importance of publications for the careers of researchers, which has greatly increased the production of papers. There has also been a large increase in the number of researches, but as citations are increasingly skewed toward the top journals it has not increased demand for papers correspondingly.

  3. Statement is correct: Four trends are significant: The fall in theoretical papers and the rise in classical papers. There is also a rise in the share of statistical method and event studies. It is surprising that there is no trend in the number of experimental studies.

  4. Statement is correct: Book reviews have dropped to less than 1/3. Perhaps, it also indicates that economists read fewer books than they used to. Journals have increasingly come to use smaller fonts and larger pages, allowing more words per page. The journals from North-Holland Elsevier have managed to cram almost two old pages into one new one. This makes it easier to publish papers, while they become harder to read.

  5. Statement is correct: About 50% of papers in the sample considered in belong to the economic theory class, about 6% are experimental studies, and about 43% are empirical studies based on data inference.

  6. Economic theory is not on the rise: The papers in economic theory have dropped from 59.5% to 33.6% – this is the largest change for any of the eight subgroups. It is highly significant in the trend test.

  7. “Theory fatigue” is a term used to describe the decreasing attractiveness of theoretical research among journals, researchers and political decision-makers. This trend goes hand in hand with the increasing importance of empirical research. Policy makers are finding it increasingly difficult to engage with variations of existing theoretical models, and researchers often struggle to systematically summarize the findings of theoretical work, making it difficult to draw definitive conclusions on specific topics. In addition, theoretical work can be unconvincing to a wider audience that must rely on the reasonableness of complex and sometimes unrealistic assumptions. The credibility of theoretical research often depends on how realistic the initial assumptions are and how plausible the conclusions are. If neither aspect is grounded in reality, there is a danger that the research becomes an abstract exercise that provides new insights into the real world, but which are difficult to communicate to the layperson.

  8. A research paper that policymakers find appealing typically offers estimates of a crucial effect that decision-makers outside of academia are keen to understand. Papers that target policymakers should put an emphasis on distilling the core findings into a short executive summary tailored for decision-makers, facilitating their understanding and application of the research insights.

Solution 3.2. Causal inference ch.1 (Exercise 3.2)

  1. Some common misconceptions about causality that the author addresses in chapter 1 include confusion between correlation and causality, and the belief that observational studies cannot (hardly) establish causality without prior knowledge. He says that human beings “engaging in optimal behavior are the main reason correlations almost never reveal causal relationships, because rarely are human beings acting randomly” which is crucial for identifying causal effects.

  2. The role of randomization in causal inference, as described in the book, is that it helps to control for confounding variables and allows for the estimation of causal effects.

Solution 3.3. Graduate admissions (Exercise 3.3)

  1. Assumption 1 is that in any given discipline male and female applicants do not differ in respect of their intelligence, skill, qualifications, promise, or other attribute deemed legitimately pertinent to their acceptance as students. It is precisely this assumption that makes the study of “sex bias” meaningful, for if we did not hold it any differences in acceptance of applicants by sex could be attributed to differences in their qualifications, promise as scholars, and so on. (…) Assumption 2 is that the sex ratios of applicants to the various fields of graduate study are not importantly associated with any other factors in admission. (Bickel et al., 1975, p. 398)
  2. Expectations were taken based on the overall acceptance rate of about 0.416 and multiplied by the total observed numbers of applicants admitted and rejected. For example: \((3738+4704) \cdot 0.41666 \approx 3460\) and \((3738+4704) \cdot (1-0.41666) \approx 4981\). Taking the difference of these two measures gives the number to be explained.
  3. The analogy is explained on page 400:

“Picture a fishnet with two different mesh sizes. A school of fish, all of identical size (assumption 1), swim toward the net and seek to pass. The female fish all try to get through the small mesh, while the male fish all try to get through the large mesh. On the other side of the net all the fish are male. Assumption 2 said that the sex of the fish had no relation to the size of the mesh they tried to get through. It is false.”

The UC Berkley case is just one of many examples to illustrate that uniformity of group assignment of individuals is a necessary condition to ensure that pooling of data does not lead to misleading conclusions when using statistics. The phenomenon of obtaining different results depending on whether one considers the data pooled or unpooled is often referred to as the Simpson Paradox.

Solution 3.4. Simpson’s Paradox (Exercise 3.4)

  1. a), 2. c) and d)

Solution 3.5. Solution to exercise Exercise 3.5

  1. The individual treatment effect (ITE) is a measure of the effect of a treatment or intervention on an individual level. It represents the difference in the outcome for an individual who receives the treatment versus the outcome for that same individual if they had not received the treatment.
  2. The average treatment effect (ATE) is a measure of the difference in the expected outcomes between a treatment group and a control group. It represents the overall effect of a treatment on the population as a whole.
  3. The ATE is calculated by taking the difference between the average outcome for the treatment group and the average outcome for the control group.
  4. No, the ATE is a population-level measure and cannot be used to determine the effect of a treatment on an individual level. To determine the effect of a treatment on an individual level, you would need to use techniques such as propensity score matching or instrumental variables.
  5. Some potential sources of bias when estimating the ATE include selection bias, measurement bias, and unobserved confounding variables. To mitigate these biases, researchers may use randomization or other advanced statistical techniques such as propensity score matching or instrumental variables to control for these potential sources of bias.

  1. Source: https://commons.wikimedia.org/wiki/File:Daniel_Kahneman_(3283955327)_(cropped).jpg↩︎

  2. Source: https://youtu.be/Wy7qpJeozec↩︎

  3. Source: https://pixabay.com/images/id-5029286/↩︎

  4. Source: Cunningham (2021)↩︎

  5. Source: https://youtu.be/DFPm_a-_uJM↩︎

  6. Source: The photography is public domain and stems from the Library of Congress Prints and Photographs Division Washington, see: http://hdl.loc.gov/loc.pnp/pp.print.↩︎

  7. Source: Bickel et al. (1975, p. 403)↩︎

  8. A VAR is a statistical model used to capture the relationship between multiple quantities as they change over time.↩︎