5  Markdown, Quarto, and R Markdown

Verbal and non-verbal communication are crucial in business. This section focuses on writing and publishing texts, excluding aspects like body language and writing skills. I will introduce some tools commonly used by data scientists for writing and publishing their work, such as Markdown, RMarkdown, and Quarto. Unlike applications like Microsoft Word or Apple Pages, these tools use code to generate text. This concept may be unfamiliar to those who grew up after Windows 95; I will provide justification for its use in the following sections.

One notable advantage of a code-based approach to writing text is its seamless integration with version control systems like Git and platforms like GitHub. These tools are essential for most data science collaborations. Mastering them can greatly enhance your efficiency and make your presentations more impactful, even if you are not directly involved in data science.

What is Quarto?

Quarto, a modern documentation system, is an excellent choice for writing, especially for projects that require rigorous data analysis, visualization, and reproducibility. This tutorial will guide you through producing various forms of text with Quarto. You can write reports, articles, theses, books, websites and many more with Quarto.

Quarto and R Markdown

Quarto is a relatively new tool and can be considered as a successor to R Markdown. Most R Markdown documents are compatible with Quarto. However, Quarto offers some improved functionality over R Markdown, which enhances user-friendliness. A detailed overview of the differences and similarities between the two can be found in this article. For an introduction to R Markdown see Section 5.4.

5.1 Why Markdown and Quarto?

5.1.1 No-code vs. code-based writing applications

Students often use Microsoft Word, Apple Pages, or LibreOffice to write scientific texts. These word processing programs operate on the “What You See Is What You Get” (WYSIWYG) principle, displaying the document layout as you type. While this principle and its corresponding applications are widespread and may seem indispensable to many, this is far from true. Alternatives such as LaTeX, Markdown, R Markdown, and Quarto offer significant advantages. Many professional scientists and publishers prefer these alternatives for good reason. A large number of doctoral theses and scientific papers are authored using LaTeX, and nearly all publishers and editors work with code-based solutions that do not follow the WYSIWYG principle.

With code-based alternatives, layout specifications are either placed at the beginning of the text or embedded within the main text itself. The final document is only visible after converting (also called compiling or rendering) it into a format such as PDF. This may initially seem unusual and less intuitive than a WYSIWYG interface, but the most intuitive solution is not necessarily the best or simplest. My experience supervising numerous student papers has shown that the intuitive features of MS Word and Pages often become time-consuming over the medium to long term and fail to adequately support users in avoiding errors when writing scientific work. Students who choose code-based applications tend to experience less frustration and greater success—at least, this has been true for the papers I have supervised.

Code-based applications allow writers to focus on the actual writing process, as formatting and adherence to citation rules are largely automated by the software. The necessary initial investment in learning a tool like Quarto quickly pays off, resulting in noticeable improvements in the quality of scientific texts.

In the following subsections, I will first outline typical usage of WYSIWYG applications, then discuss the advantages of code-based text creation using Quarto as an example, and finally explain how to successfully write texts using Quarto.

5.1.2 Typical (mis)usage of WYSIWYG applications

The use of traditional word processing software like Microsoft Word or Apple Pages for writing academic papers is pervasive among students. While these programs are user-friendly for everyday writing projects, they create a significant additional workload to meet the demands of academic work.

One of the first problems is integrating literature. Correct formatting according to various citation guidelines is often counter-intuitive, and errors occur easily. This is particularly true if the citation and bibliography functions provided by the software are not used or are used incorrectly. Instead of utilizing external citation managers and investing time to learn how to use them, many students manually create citations and bibliographies, which typically leads to numerous small and sometimes larger errors that could be avoided.

Another weak point in student work is adherence to specific formatting requirements. Academic institutions and journals often require strict adherence to formatting guidelines, including the design of title pages, headers and footers, page margins, and heading hierarchies. Although Word and Pages offer templates and styles, they must be individually adapted for each document and often modified due to minor text changes. Making a format adjustment can become a major effort.

The inclusion of empirical results such as statistical data and graphics presents an additional hurdle. With Word and Pages, the process is frequently manual: research data must be exported from statistical software, saved as images, and then embedded in the document. If the data changes, this time-consuming process must be repeated, significantly increasing the workload and risk of errors.

5.1.3 Advantages of Quarto for writing text

Writing academic texts using traditional tools such as MS Word or Pages can be time-consuming and error-prone for students. In the following section, I introduce Quarto (or R Markdown), a modern alternative that offers several advantages:

  • Versatile output formats: Quarto makes it effortless to generate different output formats. The same text can be rendered as a website (HTML), manuscript (PDF, DOCX), book (EPUB, PDF), or slides (PDF). This flexibility allows you to focus more on the content than the format.
  • Simplified formatting changes: Specific templates can be used in Quarto, simplifying the process of making formatting changes.
  • Seamless literature integration: Quarto handles citation rules compliance and integrates seamlessly with citation management systems, enabling researchers to manage literature references and bibliographies more efficiently and consistently than in Word.
  • Easy cross-referencing: Creating cross-references to sections, tables, and figures is straightforward.
  • Direct data analysis and output generation: Data analysis and output generation occur directly within Quarto, ensuring that displayed graphics and tables are always up-to-date. This eliminates the need for manual post-processing and guarantees the reproducibility of results.
  • Embedded data visualizations: Researchers can embed data visualizations directly in the text without manual intermediate steps.
  • Efficient collaboration with version control: Version control systems like Git make collaboration on academic documents more manageable. Changes can be tracked and integrated without relying on complex and conflict-prone comparison tools.
Reading recommendation

For those interested, I recommend the online course Introduction to Reproducible Publications with RStudio, which explains explicitly how to work in an empirically reproducible manner. A somewhat more compact introduction is offered by Bauer & Landesvatter (2023), and the authoritative work on the subject is by Gandrud (2020).

5.2 Markdown

Markdown is a lightweight markup language with plain-text formatting syntax. It is very popular because it is easy to learn. It’s an essential skill for using Quarto effectively. Start by learning enough Markdown to structure your thesis, including headings, lists, links, and code blocks.

You can learn Markdown (not R Markdown!) in 10 minutes. Just go to www.markdowntutorial.com and work through the interactive lessons. I also recommend the introduction offered in the section Markdown Basics on quarto.org.

5.3 Quarto

Recommended literature

Read Telford (2023): Enough Markdown to Write a Thesis. This resource covers the basics and some advanced Markdown features that are useful for academic writing.

More extensive resources on how to do things with Quarto can be found at quarto.org.

5.3.1 Introduction

To set up Quarto on your machine do the following:

  • Install R and R Studio.
  • Install Quarto as follows:
install.packages("quarto")
  • Install the tinytex package to generate PDF files:
install.packages("tinytex")
tinytex::install_tinytex()
  • It is also advisable to install additional packages that might be needed later:
if (!require(pacman)) install.packages("pacman")
pacman::p_load(knitr, rmarkdown, papaja)

Exercise 5.1 First Quarto document

  • Open RStudio.
  • Select “File” -> “New File” -> “Quarto Document” and then “Create.”
  • Save the new file in an empty folder and set this folder as your working directory.
  • Click “Render.”
  • Visit the Markdown Basics website, add some Markdown to your document, and click “Render” again.
  • Click the arrow next to the “Render” button. Here, you can select and generate other file formats. Give it a try.
  • Consult the PDF Basics website and supplement your header with the information found there.
  • Try citing the paper by Huber & Rust (2016), which you can find here, in your document.
    • Click on “Visual,”
    • Go to the place in the text where you want to cite the paper and select “Insert” -> “Citation.”
    • Search for the paper in the context menu using the corresponding DOI (https://doi.org/10.1177/1536867X1601600209) and insert it.
  • To quote using APA Version 7 style, write the following in the YAML header:
csl: "https://www.zotero.org/styles/apa"
  • Select a different citation style from www.zotero.org/styles. Then render the document again and observe the differences.

5.3.2 Create an APA compliant manuscript using Quarto

To create an APA compliant manuscript, it is recommended to use the Quarto Extension apaquarto. The process is described in detail here. Using the template ensures that all APA rules are automatically considered. As APA allows a lot of leeway and every reviewer has specific preferences, apaquarto allows for manipulation of a variety of settings. For example, the language can be changed and the general style of the document can be modified in the Preamble (YAML header).

5.4 R Markdown

Figure 5.1: Example of an R Markdown file

R Markdown provides an authoring framework for data science. You can use a single R Markdown file to transcript your work, run code, and generate high quality reports, books, websites, articles, theses, blogs, and many more (see Figure 5.1).

In contrast to Quarto (see Chapter 5), which is the more recent format, R Markdown is around for some time and hence there are uncountable resources to learn it. For example:

Figure 5.2: R Markdown Cheatsheet from Posit
Figure 5.3: Xie et al. (2020): R Markdown Cookbook
  • The book R Markdown: The Definitive Guide by Xie et al. (2018) offers a comprehensive introduction. The online version of the book is regularly updated and free of costs.
Figure 5.4: Xie et al. (2018): R Markdown: The Definitive Guide

Please watch the video What is R Markdown? and then study the R Markdown tutorial from RStudio.

Working directory in R Markdown

The working directory is by default set to the directory that contains the Rmd document. In case you want to use another directory you can do so by changing the working directory with setwd(). However, that is not persistent in R Markdown and only works for the current code chunk. After the code chunk has been evaluated, the working directory will be restored to the directory where the Rmd file is placed.

Exercise 5.2 Start Markdown and R Markdown

  1. You can learn Markdown (not R Markdown!) in 10 minutes. Just go to https://www.markdowntutorial.com and work throught the interactive lessons.
  2. Now create your first R Markdown file in 3 minutes by doing the following:
    • click in RStudio on File > New File > R Markdown
    • click OK
    • look for a button entitled Knit and click it
    • save your file (it will be saved with .Rmd file extension)
  3. Play around with the file. For example, change the output format can you create a word file or a presentation. Play around with the code chunks. Add a picture that you find somewhere online.
  4. Set your working directory to the folder where you have saved your first Rmd-file. Can you come up with a way to generate different output format with just one function.

Exercise 5.3 R Markdown cite literature

  1. Create a new R Markdown file (File > New File > R Markdown), save the file in an empty folder, and knit it.
  • Make a new script with File > New File > R Script.
  • Go to https://scholar.google.de/ and search for osrmtime.
  • Click on “cite” and “BibTeX”. Copy and paste everything that you see into your script and save the script as lit.bib. R Studio will ask you if you confirm the file type change. Click yes. Your lit.bib file should look like this:
@article{huber2016calculate,
  title={Calculate travel time and distance with OpenStreetMap 
    data using the Open Source Routing Machine (OSRM)},
  author={Huber, Stephan and Rust, Christoph},
  journal={The Stata Journal},
  volume={16},
  number={2},
  pages={416--423},
  year={2016},
  publisher={SAGE Publications Sage CA: Los Angeles, CA}
}
  • Add the text “bibliography: references.bib” to your YAML header of your R Markdown file so that it looks somehow like that:
---
title: "Untitled"
author: "Stephan Huber"
date: "`r Sys.Date()`"
output: html_document
bibliography: lit.bib
---
  • Now you can cite the OSRMTIME paper with @huber2016calculate somewhere in the text of your R Markdown file.
  • Knit the R Markdown file and you should see the paper cited and a reference list at the end of the html report.
  • You can manipulate the citation style you can specify a CSL (Citation Style Language) file in the YAML header. For example the APA style can be chosen with:
csl: "https://www.zotero.org/styles/apa.csl"

Many more citation styles can be found on github.com/citation-style-language and on the Zotero Style Repository.

Exercise 5.4 Preparing APA journal articles (papaja)

There is an easy way to write a manuscript that follows all the APA rules using the package papaja written by two psychologists from Cologne. Please read their manual and consider their repository on GitHub.

Now, install and load the package:

install.packages("papaja")
library("papaja")

Then, click “File > New File > R Markdown” and choose the “APA-style manuscript” from the section “from template”. Knit the R markdown template and you will have a template for a APA manuscript.

Apart from the obvious adjustments, I recommend to make at least two general adjustments: Change classoption to “doc” and linenumbers to “no”.

Exercise 5.5 R Markdown template

Please follow the instructions below to access the file “23-09_ds-project-desc.Rmd” from my GitHub account:

  1. Download the file from my GitHub account by clicking on the link provided here.
  2. Save the file in your working directory.
  3. Use the knit function to run the file, but be aware that it may not work properly at first. If you encounter any issues, troubleshooting may be required. Don`t worry, error messages will usually provide guidance to help you resolve the issue. Please note that the YAML header is sensitive to spacing, so be careful when setting it up to avoid breaking the code.
  4. In the project template, I have used BibTeX to cite literature. This method is excellent for automating tedious tasks such as citing papers and generating reference lists based on citation styles, saving time and reducing the likelihood of citation errors. The literature cited is in a separate file, which can be found on one of my GitHub repositories.