1 Git & GitHub Overview
1.1 What is Git?
If you’re in this workshop, or have stumbled across this book, there’s a good chance you already know what Git is, or have at least heard of it. However, if you don’t and you’ve been told by someone you should start using Git, but have no idea what that even means, then hopefully this subsection will help.
Git is a version-control system (VSC). Think of it like a better version of Microsoft Word tracked changes and Google version history that can track everything from code, to text, to pdf images. Much like how tracked changes is useful for both single and multi-user documents, Git can help us remember what we’ve done, when, and what version of the document(s) previously existed, as well as denoting which user made the changes. Where tracked changes can get unwieldy after multiple iterations, Git makes it easy to understand the whole file history without needing to use document_v3 filenames - just keep changing the same original file for as long as you want! In addition to just being able to understand a file history, when working on a project, even if you’re the only one coding, it’s important to be able to go back to previous versions if you make a mistake. This is possible with Git! In fact, this book was created using Git and GitHub, so you can explore how it was put together and iterated by going to the GitHub page. Git isn’t the only VCS available, but it’s the most prevalent, has some advantages over the alternatives in the method of storing its data, and has a good support community, so is what will be the focus in this book.
1.2 What is GitHub?
Hopefully I’ve convinced you that Git is a useful addition to your research, so now let’s turn our attention to GitHub. GitHub is a website and server system makes it easy to collaborate and share your code with the scientific community. The key feature of Git is that it’s a distributed VCS. What this means is that users can make changes to a file on their own computer and then push the updated version to GitHub so that all collaborators can then use this version of the file. In Git terminology, GitHub is your remote. Later on we’ll go through the mechanics of this, including what happens when two users make changes to the same lines of code and try to push to GitHub, but for the moment we can just appreciate that GitHub allows us to both work offline and collaborate. There are many different remote services that can be used to host our remote code, such as Bitbucket or GitLab, but I’d strongly recommend you use GitHub over the alternatives for a number of reasons. Principally, GitHub has the largest user base, so more people will likely see your work. With GitHub, if you ever want to make your code open-source, you immediately have access to the largest community of programmers who can help you improve your code, as well as putting it to good use. And isn’t that why we do research? As an academic or student, you can get a free PRO account, meaning unlimited collaborators on private repositories and a bunch of other useful things that we’ll touch on more later. GitHub is also owned and backed by Microsoft. While this is a negative for some, it does result in tighter integration with Azure cloud computing, very active development of the platform with frequent improvements to the user experience (e.g. GitHub Actions that allow for easy continuous integration), and fewer data storage concerns with regards to university policies. If you really want to avoid all Microsoft based products, I’d recommend you look at GitLab.