2  …writing code

When you talk to a computer, you use a programming language. Computers can handle certain tasks for you, like doing math or creating graphs. They don’t complain and work fast. They just need clear instructions. Giving clear instructions to a machine, however, isn’t easy. Unlike humans, computers lack intuition and cannot adapt to the context of your commands; they interpret instructions literally. Check out this video. It’ll show you that good communication among human beings isn’t about explaining things perfectly, but about using your gut feeling and common sense effectively.

If you want a computer to handle computationally intensive and repetitive tasks, you must learn to “speak” in a way that the computer understands. Never assume the computer knows what you mean. The following sections will emphasize this: Be precise and be specific.

2.1 Bake a cake

As a teacher, I often contemplate how I can teach students to communicate with a computer. In this section, I will attempt to translate a cake baking recipe into a programming language, using R. I hope you find this both amusing and insightful. Let’s start by examining a simplified cake recipe:

Instructions to bake a cake:
  1. Buy an oven.
  2. Buy ingredients (flour, sugar, butter, eggs, soda).
  3. Buy tools.
  4. Clean everything..
  5. Heat up the oven.
  6. Prepare the tools (springform, bowl, mixer).
  7. Weight all ingredients.
  8. Take a bow and all the weighted ingredients and put everything in a bowl.
  9. Take the mixer and mix all ingredients in the bowl for 3 minutes.
  10. Put all the mixed ingredients in a springform pan.
  11. Take the springform pan with the mixed ingredients and put it in the oven.
  12. Bake for 30 minutes, take the springform pan it out of the oven, and turn off the oven.
  13. Clean the kitchen and the tools.

Although this recipe is simplified, it illustrates a process you might be familiar with. Now, let’s assume a computer is tasked with baking a cake. How would we explain the necessary steps to the computer using the R programming language?

In R, a functional programming language, we understand that everything that happens is a function call, and every entity is an object. Therefore, we must translate all actions into functions and all items into objects.

Here’s what the translated recipe could look like in R:

buy(oven, springform, bowl, mixer, flour, sugar, butter, eggs, soda)
clean(oven, springform, bowl, mixer)
turn_on(oven)
prepare(springform, bowl, mixer)
weigh(flour, sugar, butter, eggs, soda)
dough <- bowl |>
  put(flour, sugar, butter, eggs, soda) |> 
  action(tool = mixer, time = 3) 

dough_springform <- springform |> 
  put(dough) |> 
  
dough_oven <- oven |> 
  put(dough_springform) |>
  action(tool = oven, time = 30) |> 
  pull()

turn_of(oven)
clean(oven, springform, bowl, mixer)

Understanding the translation of a recipe into code becomes clearer when we familiarize ourselves with two key programming operators:

  1. The “<-” is known as the assignment operator. It saves or stores data into a new object. It might be helpful to think of it as saying, “I create the object <name of object> and store therein”
  2. The “|>” is known as the pipe operator. It passes the output of one action to serve as the input for the next. Think of it as saying “and then.”

For example, the following lines:

dough <- bowl |>
  put(flour, sugar, butter, eggs, soda) |> 
  action(tool = mixer, time = 3) 

can be interpreted as:

I create the object `dough` and I store therein the bowl, and then 
  I put flour, sugar, butter, eggs, and soda to it, and then
  I take action with the mixer for 3 minutes

In the preceding functions, you’ll notice objects separated by commas and parameters like tool = mixer, time = 3. These parameters define the behavior of the function. When there’s nothing within the brackets, as in pull(), the input is merely the output of the preceding pipe operator.

Even though R is no good as a cook and the recipe is missing some steps, this analogy helps to illustrate how programming languages work: they allow us to instruct the computer in a sequential way. Next, I will showcase why coding is appealing.

2.2 Elegant code

Let’s make our code more elegant, that is, easy to read, understand, and modify. For example, while it is equivalent to write everything in one line

dough <- bowl |> put(flour, sugar, butter, eggs, soda) |> action(tool = mixer, time = 3) 

or spread out over three lines,

dough <- bowl |>
  put(flour, sugar, butter, eggs, soda) |> 
  action(tool = mixer, time = 3) 

it is easier for the human eye to read the text in spread out form.

Style in Writing Code

Writing code involves certain conventions, often referred to as a coding style. Although not strictly necessary, a consistent style can significantly enhance clarity and prevent common pitfalls. Numerous style guides aim to standardize coding practices. For example, you might find The tidyverse style guide by Hadley Wickham particularly helpful in adopting a harmonious coding style in R.

By using the assignment operator <-, we can create two objects: ingredients and tools. These objects are used multiple times throughout the process.

Here is an improved version of the script:

ingredients <- c(flour, sugar, butter, eggs, soda)
tools <- c(oven, springform, bowl, mixer)

buy(tools, ingredients)

clean(tools)
turn_on(oven)
prepare(tools)
weight(ingredients)
dough <- bowl |>
  put(ingredients) |> 
  action(tool = mixer, time = 3) 

dough_springform <- springform |> 
  put(dough) 
  
dough_oven <- oven |> 
  put(dough_springform) |>
  action(tool = oven, time = 30) |> 
  pull()

turn_of(oven)
clean(, tools)

This version refines the process, making the code more streamlined and easier to follow.

2.3 Bake a cheese cake

Now, let’s assume you want to bake another cake, this time with chocolate and banana, but without eggs. Moreover, you need to bake it for 45 minutes. We can easily adapt the code snippet from above to accommodate the ingredients for this new recipe:

ingredients <- c(flour, sugar, butter, soda, banana, chocolate)
tools <- c(oven, springform, bowl, mixer)

buy(tools, ingredients)

clean(tools)
turn_on(oven)
prepare(tools)
weight(ingredients)
dough <- bowl |>
  put(ingredients) |> 
  action(tool = mixer, time = 3) 

dough_springform <- springform |> 
  put(dough) 
  
dough_oven <- oven |> 
  put(dough_springform) |>
  action(tool = oven, time = 45) |> 
  pull()

turn_of(oven)
clean(kitchen, tools)

2.4 Comment what you do

Sometimes code can be difficult to understand for humans. It is therefore helpful to add comments to clarify what the individual code sections are supposed to do. In R, comments can be added with a leading hashtag, #.

# Decide on tools and ingredients 
ingredients <- c(flour, sugar, butter, soda, banana, chocolate)
tools <- c(oven, springform, bowl, mixer)

# Go shopping
buy(tools, ingredients)

# Prepare the kitchen, tools, and ingredients
clean(tools)
turn_on(oven)
prepare(tools)
weight(ingredients)

# Make the dough
dough <- bowl |>
  put(ingredients) |> 
  action(tool = mixer, time = 3) 
dough_springform <- springform |> 
  put(dough) |> 

# bake the cake
dough_oven <- oven |> 
  put(dough_springform) |>
  action(tool = oven, time = 45) |> 
  pull()

# Clean up
turn_of(oven)
clean(kitchen, tools)

2.5 Bake 10 cakes

As a computer can reproduce a cake within seconds (I mean, not really, just in my little fun exercise here), we now have the opportunity to experiment with several versions of the cake by varying the baking time from 35 to 45 minutes. Here’s how the corresponding code might look:

ingredients <- c(flour, sugar, butter, soda, banana, chocolate)
tools <- c(springform, bowl, mixer)

buy(tools, ingredients)

clean(tools)
turn_on(oven)
prepare(tools)
weight(ingredients)
dough <- bowl |>
  put(ingredients) |> 
  action(tool = mixer, time = 3) 

dough_springform <- springform |> 
  put(dough) 

for (timing in 35:44) {
  dough_oven <- oven |> 
    put(dough_springform) |>
    action(tool = oven, time = timing) |> 
    pull()
  assign(paste("dough_oven_min_", timing, sep = ""), dough_oven)
}

turn_of(oven)
clean(tools)

You can see a loop with some new and tweaked lines:

for (timing in 35:44) {
  dough_oven <- oven |> 
    put(dough_springform) |>
    action(tool = oven, time = timing) |> 
    pull()
  assign(paste("dough_oven_min_", timing, sep = ""), dough_oven)
}

These lines sequentially execute the following actions:

Let the object timing be 35, make a cake, and save it in the object `dough_oven_min_35` then start again and
let the object timing be 36, make a cake, and save it in the object `dough_oven_min_36` then start again and
let the object timing be 37, make a cake, and save it in the object `dough_oven_min_37` then start again and
let the object timing be 38, make a cake, and save it in the object `dough_oven_min_38` then start again and
let the object timing be 39, make a cake, and save it in the object `dough_oven_min_39` then start again and
let the object timing be 40, make a cake, and save it in the object `dough_oven_min_40` then start again and
let the object timing be 41, make a cake, and save it in the object `dough_oven_min_41` then start again and
let the object timing be 42, make a cake, and save it in the object `dough_oven_min_42` then start again and
let the object timing be 43, make a cake, and save it in the object `dough_oven_min_43` then start again and
let the object timing be 44, make a cake, and save it in the object `dough_oven_min_44` then start again 

After all, we have ten cakes. This shows how we can harness the processing power of a computer. Computers are excellent at performing everyday, repetitive tasks so that we can automate processes and perform procedures effortlessly over and over again.

2.6 Writing real code

Of course, computers can’t bake a cake. The R programming language can do none of the above. Nevertheless, there are analogies to the programming language R. Let me present a few lines of code and explain these lines of code to you, and you will see that the similarities are striking.

Copy that code chunk, paste it into a R script and run it.

# This script demonstrates a typical data analysis workflow in R
# ---------------------------------------------------------------

# Install and load required libraries
if (!require(pacman)) install.packages("pacman")
pacman::p_unload(all)
pacman::p_load(tidyverse,haven, janitor)

# Set the working directory to a project-specific folder
setwd("~/Documents")

# Clear the current environment of any objects
rm(list = ls())

# Load data from a Stata file available online
auto <- read_dta("http://www.stata-press.com/data/r18/auto.dta")

# Display basic information about the dataset
ncol(auto) # Number of columns
nrow(auto) # Number of rows
dim(auto) # Dimensions of the dataset
names(auto) # Names of variables
head(auto) # First few rows
tail(auto) # Last few rows
summary(auto) # Summary statistics for each column
glimpse(auto) # Compact display of the structure of the dataset
print(auto, n = Inf) # Print all rows of the dataset

# Check for duplicate entries based on the 'make' variable
auto |>
  get_dupes(make)

# Create and display a scatter plot of car price versus weight
plot_weight_price <- ggplot(auto, aes(x = weight, y = price)) +
  geom_point()
plot_weight_price

# Save the plot to a file
ggsave("plot_weight_price.png", plot = plot_weight_price, dpi = 300)