3 Epistemic errors

Many things can go wrong when analysing data. Various issues can lead to misleading interpretations, whether due to data not measuring what it is intended to, human misinterpretation, or an overreliance on probabilistic reasoning. The following examples illustrate these pitfalls and are drawn from the insightful book by Jones (2020), which I highly recommend for further reading.

Epistemic errors, that are mistakes related to knowledge, understanding, or the acquisition of information, arise from cognitive biases, misunderstandings, and incorrect assumptions about the nature of data and the reality it represents. Recognizing and addressing these errors is crucial for accurate data analysis and effective decision-making.

When researchers analyse data they should be aware of the fact that data can represent reality but do not necessarily do so. For example, a survey on customer satisfaction that only includes responses from a self-selected group of highly engaged customers may not accurately reflect the overall customer base, see the box Hite Report. To avoid this pitfall, it is essential to ensure that your data collection methods are representative and unbiased, and to validate your data against external benchmarks or additional data sources.

Hite Report

In 1976, when the The Hite Report (see Hite (1976) and Figure 3.1) was published it instantly became a best seller. Hite used an individualistic research method. Thousands of responses from anonymous questionnaires were used as a framework to develop a discourse on human responses to gender and sexuality. The following comic concludes the main results.

The picture of womens’ sexuality in Hite (1976) was probably a bit biased as the sample can hardly be considered to be a random and unbiased one:

Less than 5% of all questionnaires which were sent out were filled out and returned (response bias).
The questions were only sent out to women’s organizations (an opportunity sample).

Thus, the results were based on a sample of women who were highly motivated to answer survey’s questions, for whatever reason.

Hite, S. (1976). The hite report. A nationwide study of female sexuality. New York: Dell.

The Data-Reality Gap

The difference between the data we collect and the reality it is supposed to represent.

Another common epistemic error involves the influence of human biases during data collection and interpretation. Known as the all too human data error, this occurs when personal biases or inaccuracies affect the data. An example would be a researcher’s personal bias influencing the design of a study or the interpretation of its results. To mitigate this, implement rigorous protocols for data collection and analysis, and consider using double-blind studies and peer reviews to minimize bias.

All Too Human Data

Errors introduced by human biases or inaccuracies during data collection and interpretation.

Inconsistent ratings can also lead to epistemic errors. This happens when there is variability in data collection methods, resulting in inconsistent or unreliable data. For example, different evaluators might rate the same product using different criteria or standards. To avoid this issue, standardize data collection processes and provide training for all individuals involved in data collection to ensure consistency.

Inconsistent Ratings

Variability in data collection methods that leads to inconsistent or unreliable data.

The black swan pitfall refers to the failure to account for rare, high-impact events that fall outside regular expectations. Financial models that did not predict the 2008 financial crisis due to the unexpected nature of the events that led to it are an example of this error. To prevent such pitfalls, consider a wide range of possible outcomes in your models and incorporate stress testing to understand the impact of rare events.

The Black Swan Pitfall

The failure to account for rare, high-impact events that fall outside the realm of regular expectations.

Falsifiability and the God pitfall involve the tendency to accept hypotheses that cannot be tested or disproven. This error might occur when assuming that a correlation between two variables implies causation without the ability to test alternative explanations. To avoid this, ensure that your hypotheses are testable and that you actively seek out potential falsifications. Use control groups and randomized experiments to validate causal relationships.

Falsifiability and the God Pitfall

The tendency to accept hypotheses that cannot be tested or disproven.

To avoid epistemic errors, critically assess your assumptions, methodologies, and interpretations. Engage in critical thinking by regularly questioning your assumptions and seeking alternative explanations for your findings. Employ methodological rigor by using standardized and validated methods for data collection and analysis. Engage with peers to review and critique your work, providing a fresh perspective and identifying potential biases. Finally, stay updated with the latest research and best practices in your field to avoid outdated or incorrect methodologies.

Understanding and addressing epistemic errors can significantly improve the reliability and accuracy of your data analyses, leading to better decision-making and more trustworthy insights.

Exercise 3.1

Figure 3.3: Bananas in various stages of ripeness

Rate the ripeness level of the bananas pictured by Figure 3.3. Compare your assessment to that of a colleague and discuss any differences in your ratings. What might account for the variance in perception of the bananas’ ripeness between you and your colleague?
Specify how you rated the second and the last bananas on the ripeness scale?
Upon reevaluation, it appears that the second and the last bananas are identical in ripeness. How would you justify your initial decision now? This scenario underscores an important lesson for interpreting polls and surveys: it illustrates how subjective assessments can lead to variance in results. It highlights the necessity of ensuring clarity and consistency in the criteria used for evaluations to minimize subjective discrepancies.

Jones, B. (2020). Avoiding data pitfalls: How to steer clear of common blunders when working with data and presenting analysis and visualizations. John Wiley & Sons.