The Pattern
Like any intelligent person, you probably want to keep backups of
your code handy in case your computer catches on fire and burns to a
crisp. This is known broadly as “version control.” By far the most
widely used form of version control in the coding world is a program
called git, and GitHub is a website where you can store your
git
projects. This article will walk you through how to get
starting using git
and GitHub for your own coding
projects.
NOTE: There is a vast array of information available on
git
online since it is so popular. Oftentimes new
programmers find the information intimidating! Don’t let it scare you
away from using git
for your own projects. git
is a program capable of supporting massive, complex projects, so it is
naturally complex itself. You only need to use a small subset of
functionality yourself, and I will try to highlight a workflow that
should get you up and running relatively quickly.
Some Definitions
First, let’s define some terms relating to git
and
GitHub. git
is a program that runs on the command
line. GitHub is a website that stores git
projects.
The command line is the OG computer. Before computers had mice and screens with graphics and things that you could click on, you interacted with computers via text. That is, you type a command into the computer, and it prints some text on the screen giving you the result of the command. The command line still exists even though we have mice and fancy screens with graphics and icons, because it can be helpful for programming. RStudio has a command line built into it. It is in the bottom left corner, called the “Terminal”:
For the rest of the tutorial, whenever I say to enter something in the command line, I mean to type it here. I should also note that the terms “command line,” “terminal,” and “shell” are all generally used interchangeably, and I will use them interchangeably in this tutorial.
While I think that it is generally a good thing to know how to use the command line, there is definitely a learning curve that we can mostly avoid by using the point-and-click tools in RStudio. This article is one exception where we have to get our hands dirty with the command line, but I will try to minimize the command line usage as much as possible in general.
What’s interesting about commands in the terminal is that they are just files! There is a special type of file in operating systems called an executable file. The first word of the command you type in the terminal is the name of the file, and the rest of the words you type are given to the file as separate values called command arguments or simply “args.”
As an example, one command that we will type in a second is
git push --set-upstream
. This means to run the
git
file, and run it with two args: push
and
--set-upstream
. The command line finds the git
file by looking somewhere called the PATH. Try typing this in your
shell:
echo $PATH
#> /opt/hostedtoolcache/Python/3.8.18/x64/bin:/opt/hostedtoolcache/Python/3.8.18/x64:/home/runner/.local/bin:/opt/pipx_bin:/home/runner/.cargo/bin:/home/runner/.config/composer/vendor/bin:/usr/local/.ghcup/bin:/home/runner/.dotnet/tools:/snap/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
This is all the folders your terminal will search for the executable
file, in order, separated by colons. If you would like to see where the
terminal found the git
executable file, you can use the
which
command:
which git
#> /usr/bin/git
Since R scripts are files, you might naturally wonder if you can run R scripts like this in the terminal. The answer is yes: see this tutorial for some more explanation. I don’t generally recommend doing this, since it can be confusing for others to understand what you’re doing.
The git
program does the actual business of creating
backups of your code. You can make backups of any folder you choose on
your computer. A git
repository is one
such folder that you are making backups of. For our purposes, that
folder you are making backups of is one R project. So remember:
one R project = one git
repository.
GitHub is a website where you can store git
repositories. It is also called a git
remote… because it saves your git
repository in a
remote place.
An individual “snapshot” or backup of your code with git
is called a commit. A commit represents a small change
to a few files in your code. You add messages to your commits to label
them so you can remember what you changed in each commit.
git
stores a diff of the files in a
commit, i.e. it only stores what changed in the files. That makes
commits very lightweight in terms of storage space, so you are
encouraged to make many commits to make it easier to keep track of your
work. Commits represent a distinct unit of work that you have completed,
for example a new feature, or a new dataset with an R script to clean
it, or a set of edits to your paper/presentation.
It’s important to remember that commits are only your local version
of git
. You eventually have to push to
GitHub to copy your local backup to GitHub’s servers. If you are working
with other people, they will also push their work to the same GitHub
repository. So you will have to pull their work from
GitHub to your local computer to sync their changes, and you may
possibly have to merge their changes with yours if
there are any conflicts.
In conclusion, your git/GitHub workflow looks like this: You make commits to take snapshots of your code, then you push to GitHub. If you’re working with others, you periodically pull their changes from GitHub and merge them with your own.
Merges can be a little bit complicated so I won’t cover them here.
When it comes time to learn about them, Google is your friend. There are
also some other things in git/GitHub called branches and pull requests
that you can look up if you are interested in learning more, but it’s
beyond the scope of this article. I digress: Let’s get setup with
git
and GitHub and start using them!
Get Everything Configured
Configuration is always the hard part. After we get everything setup it should be smooth sailing. The tools you need for this job are:
- The
git
command line program. This comes pre-installed on Mac and Linux computers, but not Windows. Windows users can downloadgit
from thegit
for Windows website here. - A GitHub account. Don’t pick a stupid username like “nateybear” because you will be stuck with it forever 🙃 If you are a student, you get GitHub Pro for free. GitHub Pro allows you to create private repositories, which is great for storing your homework or other confidential/non-public things.
- The
gh
command line program. This is a command line program that allows you to configure your GitHub account to use with your local git, so that when you push to GitHub, GitHub knows who you are. Download the gh program here.
Easy as 1,2,…
1. Login to GitHub
Type gh auth login
into your terminal and hit enter. You
will be asked a series of questions, which you can answer with the arrow
keys and enter key. My answers to the questions are in blue:
You will then be redirected to a web browser where you can login to GitHub and authorize your computer.
2. Configure git
to use GitHub credentials
Run gh auth setup-git
. This tells git
where
to look for your login information for pushing to GitHub.
3. Done
Surely it can’t be that easy? Hopefully it is, sometimes it isn’t. If
you’re my student, feel free to email me if you’ve having issues. To
check that everything worked smoothly, run
git config --get github.user
to get the GitHub user that
git
is configured with, and make sure that your GitHub
username is returned.
Setup your project to use git/GitHub
There are two ways to start a project using git
and
GitHub. You can create the project on GitHub first and then download to
your computer, or you can create the project locally on your computer
first and then push to GitHub. I don’t have an opinion on which is
better, so I’ll describe how to do both and you can pick the method that
works for you.
GitHub First
First, go to your GitHub page and make a new repository
Pick a descriptive name for your repository. This is will be the name of your R project that is created. You can choose whether the repository is public or private, and you don’t need to mess with any of the additional options1. Go ahead and create the repository. You are taken to a screen with some information about how to clone your project. We want to copy the URL listed and use it later:
Now we create a new R project. Select “Version Control” as the type of project you want to create, and select “git” as the type of version control. Paste the URL to your GitHub project, make sure the project is created in the folder that you want (here I’m creating it in my “workspace” folder), and click “Create Project.”
Now you have a new R project! You’ll notice that RStudio created a
few new files. Let’s setup renv in our project and commit all of these
new files and push to GitHub. The first thing is to run
renv::init()
to start using renv in the project:
Now we navigate over to the git
tab in the upper right
corner of RStudio and we see a list of files. These are all of the
changes that we have not yet committed. We want to add all of the files
in one commit, so we check all of the boxes (aka “stage” all of the
files) and click the commit button.
A new window pops up that lets us review what we are committing. On the left-hand side, we can click through all of the files that are being committed and see what has changed. On the right-hand side, we add a short message that describes the nature of the commit. For your first commit, it’s fine to just write “initial commit.” Click the “Commit” button right below the commit message, and congratulations! You have just made your first commit.
Remember that commits are just local. Now we have to push to GitHub to sync what is on the website and what we have on our computer. Thankfully it’s as simple as clicking the green arrow to push. The blue arrow is to pull.
We have pushed to GitHub, so now we should be able to refresh our repository in our web browser and see the changes. You can see the most recent commit message in the screenshot below, and also the number of commits made.
For the rest of your project, you follow the commit/push cycle we
just did. Make some changes, select the changes in your git
tab and click the commit button, make your commit and then click the
push button. That’s all there is to it!
Local First
The first thing we do here is create a new project in RStudio, using
git
and renv:
We have a new R project created, with all of the initial files we will need. Let’s commit these as an “initial commit.” This is the exact same procedure for the initial commit as we did in the GitHub First flow.
Now we have a process to commit in our R project, however we don’t have a place to push our code to since the GitHub repository doesn’t exist yet. Go into GitHub and create a new repository, and name it the same thing as you named your R project. (You don’t need to do that, but it would be kind of confusing otherwise!)
Once your respository is created, copy the URL for your project as we did in the GitHub first workflow. Now here’s where we need to use the command line a little bit.
We need to use the command line because the git
commands
that we will run are slightly more advanced and are not something that
we can do using the “point and click” interface in RStudio. The first
command we need to run is
git remote add origin <GitHub URL>
. This tells
git
that we are adding a new remote repository for it to
push to. We name the remote origin
as a convention.
The second command that you run is
git push --set-upstream
. Because RStudio doesn’t quite have
enough information about where to push your code to, the first push
needs to have some extra information (--set-upstream
). This
is a technical detail about branches, etc. The two commands should look
like this:
From here, you can resume the normal cycle of pushing code by
clicking the green arrow in the RStudio git
panel.
Comparisons
The differences between the local-first and GitHub-first approaches
to creating projects are fairly small, in the scheme of things. When you
create your project in GitHub first, you have to manually setup
renv
. When you create your project locally first, you have
to manually configure the GitHub repository you’re pushing to. Once we
get past these minor differences, the process is the exact same:
- Choose which files to commit by clicking the checkbox next to the
file names in RStudio’s
git
panel - Click the Commit button and add a message describing what you’ve changed in this commit
- Click the green arrow to push your commit to GitHub
Conclusions
This article may seem a little complicated, but remember two things:
- There is a learning curve to using
git
. You need to at least be familiar with the terminology before you can understand what all the buttons in RStudio mean. At the same time, the learning curve isn’t as big as it might seem. If you’re comfortable with the language around committing, pushing, and pulling, then you are good to go. - This article is about configuration. Configuration is the hard part.
When you get setup with
git
and are using it day-to-day, it’s practically mindless. Day-to-day use is as simple as the three bullet points I listed above.
You can quickly venture into complex territory with git
,
but you can also stay at the surface level as we have done in this
article and live a happy, stress-free life as a programmer. You may ask,
“Why use git
if it is so complicated?” and I would answer
that inertia is a massive force in computer programming. Even if a
superior alternative to git
existed, it would take many
years and a lot of concerted effort on the part of many people to make
it a standard. For now, everyone uses git
so we will too.
Just stay out of the weeds and we can maintain our zenlike approach to
programming. If you get in the weeds, find someone who knows more about
git
than you do to help you.
Happy Coding!