5 How Git Works

Before we get too deep into how to use Git, it’s a good idea to get a better understanding of how Git works. At a basic level, we know that it is a version control system, and we can think of it like track changes for our code. But it’s a little more complicated than that, so we’ll break it down into its component parts.

5.1 Repositories

I’ve touched on the idea of a repository, but what is a repository? A repository is just a different way of saying a folder that houses everything related to a project, AKA a project directory. You don’t need to use Git to have a repository, per se, but it’s a good idea to use Git to manage your repository for the reasons we’ve already discussed. Given you’re using Git to manage your repository, Git will keep track of every file in the repository, and every change to those files. You can, however, tell Git to ignore certain files, and it will do so. This is useful for things like log files and html outputs, which are not part of the code, but are generated by the code and will take up a lot of space when you push to GitHub.

5.2 Commits

So how does Git keep track of all these changes? It does so by creating commits. A commit is a snapshot of every file in a repository, along with its changes since the last snapshot, at a given point in time. You can think of it like saving a file in a word processor, and is an action that has to be done manually. We’ll talk in more detail later about how to do this, but a key idea is to commit often, and commit early.

5.3 Branches

Branching is a complex but powerful feature of Git. It allows you to make divergent copies of your repository, and then merge them back together. This is useful for a number of reasons, but the main one is that it allows you to work on different parts of the codebase at the same time, without having to worry about conflicting changes. We’ll talk more about branching at the end of the workshop, but as a thought experiment, imagine you’re working on a project and have a new collaborator that is going to help out with some of the code. When you first set up a repository, you’ll be on the main branch, so all of your code will be made here. Now your collaborator joins the project, and you’re both working on different parts of the code. You’re working on the code that generates the plots, and your colleague is working on the code that generates the tables. You both need to make changes to the same file (say, the manuscript Quarto file), but you don’t want to have to wait for your colleague to finish their changes before you can start working on yours. You can both create a feature branch off main branch of the repository, make your changes, and then merge them back together when you’re done. This is a very simplified example, but it gives you an idea of how branching works.

Note

I used main to signify the default branch name, but did not for the example feature branches. This is because you should name your branches something meaningful, and not just feature. I’ll talk more about this later.

5.4 Remotes

Up until now, everything we’ve talked about has been local to your computer. But an integral part of Git is that it is a distributed version control system. This means that you can have a copy of your repository on your computer, and also on a remote server (or multiple in some cases).

The key thing to remember is that the remote repository works via asynchronous communication. This means that you can make changes to your local repository, and then push those changes to the remote repository. Collaborators can then pull those changes from the remote repository to their local repository. If multiple collaborators are working on the same repository at the same time, it can be easy for their local versions to get out of sync with each other, so it’s important to frequently check for updates and pull changes from the remote repository before you start working on your local repository.