buy(oven, springform, bowl, mixer, flour, sugar, butter, eggs, soda)
clean(oven, springform, bowl, mixer)
turn_on(oven)
prepare(springform, bowl, mixer)
weigh(flour, sugar, butter, eggs, soda)
<- bowl |>
dough put(flour, sugar, butter, eggs, soda) |>
action(tool = mixer, time = 3)
<- springform |>
dough_springform put(dough) |>
<- oven |>
dough_oven put(dough_springform) |>
action(tool = oven, time = 30) |>
pull()
turn_of(oven)
clean(oven, springform, bowl, mixer)
2 …writing code
When you talk to a computer, you use a programming language. Computers can handle certain tasks for you, like doing math or creating graphs. They don’t complain and work fast. They just need clear instructions. Giving clear instructions to a machine, however, isn’t easy. Unlike humans, computers lack intuition and cannot adapt to the context of your commands; they interpret instructions literally. Check out this video. It’ll show you that good communication among human beings isn’t about explaining things perfectly, but about using your gut feeling and common sense effectively.
If you want a computer to handle computationally intensive and repetitive tasks, you must learn to “speak” in a way that the computer understands. Never assume the computer knows what you mean. The following sections will emphasize this: Be precise and be specific.
2.1 Bake a cake
As a teacher, I often contemplate how I can teach students to communicate with a computer. In this section, I will attempt to translate a cake baking recipe into a programming language, using R. I hope you find this both amusing and insightful. Let’s start by examining a simplified cake recipe:
Although this recipe is simplified, it illustrates a process you might be familiar with. Now, let’s assume a computer is tasked with baking a cake. How would we explain the necessary steps to the computer using the R programming language?
In R, a functional programming language, we understand that everything that happens is a function call, and everything that exist is an object. Therefore, we must translate all actions into functions and all items into objects.
Here’s what the translated recipe could look like in R:
Understanding the translation of a recipe into code becomes clearer when we familiarize ourselves with two key programming operators:
- The “
<-
” is known as the assignment operator. It saves or stores data into a new object. It might be helpful to think of it as saying, “I create the object<name of object>
and store therein” - The “
|>
” is known as the pipe operator. It passes the output of one action to serve as the input for the next. Think of it as saying “and then.”
For example, the following lines:
<- bowl |>
dough put(flour, sugar, butter, eggs, soda) |>
action(tool = mixer, time = 3)
can be interpreted as:
I create the object `dough` and I store therein the bowl, and then
I put flour, sugar, butter, eggs, and soda to it, and then I take action with the mixer for 3 minutes
In the preceding functions, you’ll notice objects separated by commas and parameters like tool = mixer, time = 3
. These parameters define the behavior of the function. When there’s nothing within the brackets, as in pull()
, the input is merely the output of the preceding pipe operator.
Even though R is no good as a cook and the recipe is missing some steps, this analogy helps to illustrate how programming languages work: they allow us to instruct the computer in a sequential way. Next, I will showcase why coding is appealing.
2.2 Elegant code
Let’s make our code more elegant, that is, easy to read, understand, and modify. For example, while it is equivalent to write everything in one line
<- bowl |> put(flour, sugar, butter, eggs, soda) |> action(tool = mixer, time = 3) dough
or spread out over three lines,
<- bowl |>
dough put(flour, sugar, butter, eggs, soda) |>
action(tool = mixer, time = 3)
it is easier for the human eye to read the text in spread out form.
Writing code involves certain conventions, often referred to as a coding style. Although not strictly necessary, a consistent style can significantly enhance clarity and prevent common pitfalls. Numerous style guides aim to standardize coding practices. For example, you might find The tidyverse style guide by Hadley Wickham particularly helpful in adopting a harmonious coding style in R.
By using the assignment operator <-
, we can create two objects: ingredients
and tools
. These objects are used multiple times throughout the process.
Here is an improved version of the script:
<- c(flour, sugar, butter, eggs, soda)
ingredients <- c(oven, springform, bowl, mixer)
tools
buy(tools, ingredients)
clean(tools)
turn_on(oven)
prepare(tools)
weight(ingredients)
<- bowl |>
dough put(ingredients) |>
action(tool = mixer, time = 3)
<- springform |>
dough_springform put(dough)
<- oven |>
dough_oven put(dough_springform) |>
action(tool = oven, time = 30) |>
pull()
turn_of(oven)
clean(, tools)
This version refines the process, making the code more streamlined and easier to follow.
2.3 Bake a cheese cake
Now, let’s assume you want to bake another cake, this time with chocolate and banana, but without eggs. Moreover, you need to bake it for 45 minutes. We can easily adapt the code snippet from above to accommodate the ingredients for this new recipe:
<- c(flour, sugar, butter, soda, banana, chocolate)
ingredients <- c(oven, springform, bowl, mixer)
tools
buy(tools, ingredients)
clean(tools)
turn_on(oven)
prepare(tools)
weight(ingredients)
<- bowl |>
dough put(ingredients) |>
action(tool = mixer, time = 3)
<- springform |>
dough_springform put(dough)
<- oven |>
dough_oven put(dough_springform) |>
action(tool = oven, time = 45) |>
pull()
turn_of(oven)
clean(kitchen, tools)
2.5 Bake 10 cakes
As a computer can reproduce a cake within seconds (I mean, not really, just in my little fun exercise here), we now have the opportunity to experiment with several versions of the cake by varying the baking time from 35 to 45 minutes. Here’s how the corresponding code might look:
<- c(flour, sugar, butter, soda, banana, chocolate)
ingredients <- c(springform, bowl, mixer)
tools
buy(tools, ingredients)
clean(tools)
turn_on(oven)
prepare(tools)
weight(ingredients)
<- bowl |>
dough put(ingredients) |>
action(tool = mixer, time = 3)
<- springform |>
dough_springform put(dough)
for (timing in 35:44) {
<- oven |>
dough_oven put(dough_springform) |>
action(tool = oven, time = timing) |>
pull()
assign(paste("dough_oven_min_", timing, sep = ""), dough_oven)
}
turn_of(oven)
clean(tools)
You can see a loop with some new and tweaked lines:
for (timing in 35:44) {
<- oven |>
dough_oven put(dough_springform) |>
action(tool = oven, time = timing) |>
pull()
assign(paste("dough_oven_min_", timing, sep = ""), dough_oven)
}
These lines sequentially execute the following actions:
Let the object timing be 35, make a cake, and save it in the object `dough_oven_min_35` then start again and
let the object timing be 36, make a cake, and save it in the object `dough_oven_min_36` then start again and
let the object timing be 37, make a cake, and save it in the object `dough_oven_min_37` then start again and
let the object timing be 38, make a cake, and save it in the object `dough_oven_min_38` then start again and
let the object timing be 39, make a cake, and save it in the object `dough_oven_min_39` then start again and
let the object timing be 40, make a cake, and save it in the object `dough_oven_min_40` then start again and
let the object timing be 41, make a cake, and save it in the object `dough_oven_min_41` then start again and
let the object timing be 42, make a cake, and save it in the object `dough_oven_min_42` then start again and
let the object timing be 43, make a cake, and save it in the object `dough_oven_min_43` then start again and let the object timing be 44, make a cake, and save it in the object `dough_oven_min_44` then start again
After all, we have ten cakes. This shows how we can harness the processing power of a computer. Computers are excellent at performing everyday, repetitive tasks so that we can automate processes and perform procedures effortlessly over and over again.
2.6 Writing real code
Of course, computers can’t bake a cake. The R programming language can do none of the above. Nevertheless, there are analogies to the programming language R. Let me present a few lines of code and explain these lines of code to you, and you will see that the similarities are striking.
Copy that code chunk, paste it into a R script and run it.
# This script demonstrates a typical data analysis workflow in R
# ---------------------------------------------------------------
# Install and load required libraries
if (!require(pacman)) install.packages("pacman")
::p_unload(all)
pacman::p_load(tidyverse,haven, janitor)
pacman
# Set the working directory to a project-specific folder
setwd("~/Documents")
# Clear the current environment of any objects
rm(list = ls())
# Load data from a Stata file available online
<- read_dta("http://www.stata-press.com/data/r18/auto.dta")
auto
# Display basic information about the dataset
ncol(auto) # Number of columns
nrow(auto) # Number of rows
dim(auto) # Dimensions of the dataset
names(auto) # Names of variables
head(auto) # First few rows
tail(auto) # Last few rows
summary(auto) # Summary statistics for each column
glimpse(auto) # Compact display of the structure of the dataset
print(auto, n = Inf) # Print all rows of the dataset
# Check for duplicate entries based on the 'make' variable
|>
auto get_dupes(make)
# Create and display a scatter plot of car price versus weight
<- ggplot(auto, aes(x = weight, y = price)) +
plot_weight_price geom_point()
plot_weight_price
# Save the plot to a file
ggsave("plot_weight_price.png", plot = plot_weight_price, dpi = 300)
2.4 Comment what you do
Sometimes code can be difficult to understand for humans. It is therefore helpful to add comments to clarify what the individual code sections are supposed to do. In R, comments can be added with a leading hashtag,
#
.