The Pattern
When you’re doing an academic project, you want have a self-contained environment where you keep your data, write your scripts, and compile the finished product (a report, slides, etc.). In R, this concept is known as an R “project.” Some patterns around R projects that I want to introduce you to in this writeup are
- How to create a project and what a project workflow looks like
- How to organize your files and refer to them consistently using the
here
package - How to track your changes and share progress on GitHub
- How to track your dependencies with the
renv
package
Each of these topics is worthy of a writeup in and of itself. Here I will give a high-level summary and action items for you to remember. In subsequent writeups I will dive further into an individual topic. You can take these in whatever order you want; come back here at the end to review the takeaways.
Always use R Projects
Creating an R Project is nothing more than creating an
.Rproj
file in a folder, and the folder is then considered
an R Project by RStudio. Each different thing you work on should be a
different R project. The files themselves store configuration, but they
are also just an important organizational concept. Open an R project by
double-clicking on the .Rproj
file, and RStudio loads your
project in a “fresh” new window.
When you are using RStudio, you can type R code into the console to experiment, or you can write code in a file and then “source” the file to run it. Never just use the console when you are writing code. You can tinker in the console, but it eventually needs to be saved in a file so that you can reproduce it. Never save your workspace data on exit for this exact reason: If you spend hours and hours tinkering with something to get it perfect and never write down how you got it that way, how will you remember what you did when your peer/supervisor/journal editor asks for replication code?
Be Militant with File Structure
Always make your file structure self-evident. Data, figures,
and tables go in different folders. The R code to produce a given
dataset/figure/table lives in a file whose name corresponds to the name
of the thing it generates. It should be so obvious that your TA can open
up your GitHub repository and give you a grade for your work without any
explanation from you. For bonus points, use a build tool like a Makefile
or targets
file.
The here
package allows you to refer to files inside of
your project in a consistent way. Always use here
to refer
to files in your project. Refer to them relative to the
base folder of your project. Doing it this way makes file references
work anywhere in your project and for anyone who runs
your code. Avoid absolute file references like the plague.
One .Rproj
= One GitHub Repository
GitHub is a place to store your code. Use it. Students have free access to GitHub Pro and can create private repositories. The natural association between R Projects and GitHub repositories is one-to-one. Using the Git tab in RStudio makes it super easy to commit/push/pull with the click of a button.
Consider using GitHub even if you aren’t working with others or sharing the project publicly. It’s like a ready-made backup of your code if your computer crashes or you’re having other issues. If you do share the project publicly, think of GitHub as a coding résumé.
Lockfiles to Lock In Dependencies
Using renv
helps you keep track of what versions of R
packages you are using. This helps reproducibility even more. You don’t
have to tell people which packages they need to install to run your
code. When you install renv
in a project, it will
automatically activate itself each time you open the project. Remember
to run renv::snapshot()
before you commit new code.
Conclusion
Project structure and management is the epitome of the Zen of Programming I mentioned in the intro article. Stay disciplined when you setup a project, write your code in R scripts, and keep the file structure of your project clean. Failure to do so can cause headaches, and headaches take away from our main goal of writing code and producing research.
Happy Coding!