install.packages("quarto")
5 Markdown, Quarto, and R Markdown
Verbal and non-verbal communication are crucial in business. This section focuses on writing and publishing texts, excluding aspects like body language and writing skills. I will introduce some tools commonly used by data scientists for writing and publishing their work, such as Markdown, RMarkdown, and Quarto. Unlike applications like Microsoft Word or Apple Pages, these tools use code to generate text. This concept may be unfamiliar to those who grew up after Windows 95; I will provide justification for its use in the following sections.
One notable advantage of a code-based approach to writing text is its seamless integration with version control systems like Git and platforms like GitHub. These tools are essential for most data science collaborations. Mastering them can greatly enhance your efficiency and make your presentations more impactful, even if you are not directly involved in data science.
Quarto is a relatively new tool and can be considered as a successor to R Markdown. Most R Markdown documents are compatible with Quarto. However, Quarto offers some improved functionality over R Markdown, which enhances user-friendliness. A detailed overview of the differences and similarities between the two can be found in this article. For an introduction to R Markdown see Section 5.4.
5.1 Why Markdown and Quarto?
5.1.1 No-code vs. code-based writing applications
Students often use Microsoft Word, Apple Pages, or LibreOffice to write scientific texts. These word processing programs operate on the “What You See Is What You Get” (WYSIWYG) principle, displaying the document layout as you type. While this principle and its corresponding applications are widespread and may seem indispensable to many, this is far from true. Alternatives such as LaTeX, Markdown, R Markdown, and Quarto offer significant advantages. Many professional scientists and publishers prefer these alternatives for good reason. A large number of doctoral theses and scientific papers are authored using LaTeX, and nearly all publishers and editors work with code-based solutions that do not follow the WYSIWYG principle.
With code-based alternatives, layout specifications are either placed at the beginning of the text or embedded within the main text itself. The final document is only visible after converting (also called compiling or rendering) it into a format such as PDF. This may initially seem unusual and less intuitive than a WYSIWYG interface, but the most intuitive solution is not necessarily the best or simplest. My experience supervising numerous student papers has shown that the intuitive features of MS Word and Pages often become time-consuming over the medium to long term and fail to adequately support users in avoiding errors when writing scientific work. Students who choose code-based applications tend to experience less frustration and greater success—at least, this has been true for the papers I have supervised.
Code-based applications allow writers to focus on the actual writing process, as formatting and adherence to citation rules are largely automated by the software. The necessary initial investment in learning a tool like Quarto quickly pays off, resulting in noticeable improvements in the quality of scientific texts.
In the following subsections, I will first outline typical usage of WYSIWYG applications, then discuss the advantages of code-based text creation using Quarto as an example, and finally explain how to successfully write texts using Quarto.
5.1.2 Typical (mis)usage of WYSIWYG applications
The use of traditional word processing software like Microsoft Word or Apple Pages for writing academic papers is pervasive among students. While these programs are user-friendly for everyday writing projects, they create a significant additional workload to meet the demands of academic work.
One of the first problems is integrating literature. Correct formatting according to various citation guidelines is often counter-intuitive, and errors occur easily. This is particularly true if the citation and bibliography functions provided by the software are not used or are used incorrectly. Instead of utilizing external citation managers and investing time to learn how to use them, many students manually create citations and bibliographies, which typically leads to numerous small and sometimes larger errors that could be avoided.
Another weak point in student work is adherence to specific formatting requirements. Academic institutions and journals often require strict adherence to formatting guidelines, including the design of title pages, headers and footers, page margins, and heading hierarchies. Although Word and Pages offer templates and styles, they must be individually adapted for each document and often modified due to minor text changes. Making a format adjustment can become a major effort.
The inclusion of empirical results such as statistical data and graphics presents an additional hurdle. With Word and Pages, the process is frequently manual: research data must be exported from statistical software, saved as images, and then embedded in the document. If the data changes, this time-consuming process must be repeated, significantly increasing the workload and risk of errors.
5.1.3 Advantages of Quarto for writing text
Writing academic texts using traditional tools such as MS Word or Pages can be time-consuming and error-prone for students. In the following section, I introduce Quarto (or R Markdown), a modern alternative that offers several advantages:
- Versatile output formats: Quarto makes it effortless to generate different output formats. The same text can be rendered as a website (HTML), manuscript (PDF, DOCX), book (EPUB, PDF), or slides (PDF). This flexibility allows you to focus more on the content than the format.
- Simplified formatting changes: Specific templates can be used in Quarto, simplifying the process of making formatting changes.
- Seamless literature integration: Quarto handles citation rules compliance and integrates seamlessly with citation management systems, enabling researchers to manage literature references and bibliographies more efficiently and consistently than in Word.
- Easy cross-referencing: Creating cross-references to sections, tables, and figures is straightforward.
- Direct data analysis and output generation: Data analysis and output generation occur directly within Quarto, ensuring that displayed graphics and tables are always up-to-date. This eliminates the need for manual post-processing and guarantees the reproducibility of results.
- Embedded data visualizations: Researchers can embed data visualizations directly in the text without manual intermediate steps.
- Efficient collaboration with version control: Version control systems like Git make collaboration on academic documents more manageable. Changes can be tracked and integrated without relying on complex and conflict-prone comparison tools.
For those interested, I recommend the online course Introduction to Reproducible Publications with RStudio, which explains explicitly how to work in an empirically reproducible manner. A somewhat more compact introduction is offered by Bauer & Landesvatter (2023), and the authoritative work on the subject is by Gandrud (2020).
5.2 Markdown
Markdown is a lightweight markup language with plain-text formatting syntax. It is very popular because it is easy to learn. It’s an essential skill for using Quarto effectively. Start by learning enough Markdown to structure your thesis, including headings, lists, links, and code blocks.
You can learn Markdown (not R Markdown!) in 10 minutes. Just go to www.markdowntutorial.com and work through the interactive lessons. I also recommend the introduction offered in the section Markdown Basics on quarto.org.
5.3 Quarto
Read Telford (2023): Enough Markdown to Write a Thesis. This resource covers the basics and some advanced Markdown features that are useful for academic writing.
More extensive resources on how to do things with Quarto can be found at quarto.org.
5.3.1 Introduction
To set up Quarto on your machine do the following:
- Install R and R Studio.
- Install Quarto as follows:
- Install the tinytex package to generate PDF files:
install.packages("tinytex")
::install_tinytex() tinytex
- It is also advisable to install additional packages that might be needed later:
if (!require(pacman)) install.packages("pacman")
::p_load(knitr, rmarkdown, papaja) pacman
5.3.2 Create an APA compliant manuscript using Quarto
To create an APA compliant manuscript, it is recommended to use the Quarto Extension apaquarto
. The process is described in detail here. Using the template ensures that all APA rules are automatically considered. As APA allows a lot of leeway and every reviewer has specific preferences, apaquarto
allows for manipulation of a variety of settings. For example, the language can be changed and the general style of the document can be modified in the Preamble (YAML header).
5.4 R Markdown
R Markdown provides an authoring framework for data science. You can use a single R Markdown file to transcript your work, run code, and generate high quality reports, books, websites, articles, theses, blogs, and many more (see Figure 5.1).
In contrast to Quarto (see Chapter 5), which is the more recent format, R Markdown is around for some time and hence there are uncountable resources to learn it. For example:
- The R Markdown Cheatsheet (see Figure 5.2) from Posit offers an overview on the most important features of R Markdown.
- The book R Markdown Cookbook by Xie et al. (2020) (see Figure 5.3) offers an introduction. The online version of the book is regularly updated and free of costs.
- The book R Markdown: The Definitive Guide by Xie et al. (2018) offers a comprehensive introduction. The online version of the book is regularly updated and free of costs.
Please watch the video What is R Markdown? and then study the R Markdown tutorial from RStudio.
The working directory is by default set to the directory that contains the Rmd document. In case you want to use another directory you can do so by changing the working directory with setwd()
. However, that is not persistent in R Markdown and only works for the current code chunk. After the code chunk has been evaluated, the working directory will be restored to the directory where the Rmd file is placed.