How to Use R for Data Science
Lecture Notes
Preface
About R
The programming language R enables you to handle, visualize, and analyze data. It is compatible with various operating systems (Windows, Mac, Linux) and can do a lot of things better compared to other programs like Python, Stata, Eviews, SPSS, SAS, and Excel. R is open source, extensively utilized, and there are abundant resources available for learning it. These notes are just my five cents.
About the cover of the notes
Data science is a buzzword that combines different fields of knowledge such as computer science, software engineering, informatics, database management, statistics, econometrics, business intelligence, and mathematics. However, there is no universally accepted definition of it and I think it is not important to define it precisely. Kelleher & Tierney (2018, p. 97) wrote “Data science is best understood as a partnership between a data scientist and a computer.” So data science is about embracing the power of computers for scientific, commercial or social purposes. Of course, empirical models and statistics play a role in gaining meaningful insights. The graphic on the cover page may illustrate that R combines four important fields, that are, data, science, computer, and statistics.
About the notes
Please note that while the PDF contains the same content, it has not been optimized for PDF format. Therefore, some parts may not appear as intended.
- These notes aims to support my lecture at the HS Fresenius but are incomplete and no substitute for taking actively part in class.
- I hope you find this book helpful. Any feedback is both welcome and appreciated.
- This is work in progress so please check for updates regularly.
- These notes offer a curated collection of explanations, exercises, and tips to facilitate learning R without causing unnecessary frustration. However, these notes don’t aim to rival comprehensive textbooks such as Wickham & Grolemund (2023).
- These notes are published under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. This means it can be reused, remixed, retained, revised and redistributed as long as appropriate credit is given to the authors. If you remix, or modify the original version of this open textbook, you must redistribute all versions of this open textbook under the same license. This script draws from the work of Navarro (2020), Muschelli & Jaffe (2022), Thulin (2021), and Ismay & Kim (2022) which is also published under the same license.
- I host the notes in a GitHub repo.
I recommend to copy all the code that is shown in the book into a R script and try to run it on your PC. That is the best way to learn, understand, and create your own notes that may guide you later on. Whenever you see interesting code somewhere, try to run it on your PC. Moreover, I recommend the exercises of the book, they are challenging sometimes but to really understand code you need to run code yourself.
Structure of these notes
Chapter | Explanations |
---|---|
…R | Learn the basics everyone should know about R and RStudio, including how to install them. |
…writing code | Learn the basics of writing code. |
…writing R scripts | Learn how to use R scripts and their benefits. |
Interactive introduction using swirl | A hands-on tutorial on how to use the swirl package. This section is optional. |
Kickstart | A quick start guide for beginners on how to dive into R, showcasing some of its capabilities. |
Pitfalls | Discover common mistakes beginners often make and how to avoid them to save time on troubleshooting. |
Manage data | Learn how to manipulate data in R. |
Visualize data | A quick guide on where to find resources to learn about creating graphical visualizations in R. |
Collection of exercises | A set of exercises to practice R programming skills. |
Appendix | A set of useful stuff that will help you to navigate through your file system, find the right operator and function, or to learn some useful shortcuts. |