I would imagine that most people here will have found themselves in this situation before:
I certainly have! And over the years I’ve tried different ways of organising revisions. For documents it works quite well to start your document title with a date, for example 2020-04-05_Project_analysis.docx
so that a new version would have a more recent data, such as 2020-05-08_Project_analysis.docx
. That way your revisions are organised in chronological order.
If you are working with multiple people on the same document you can add initials to the end, e.g. 2020-04-05_Project_analysis_JD.docx
, so you know who has reviewed/commented on the document. You can then merge documents within MS Word, if you’re using that.
More recently, I’ve become a fan of Google Docs, which provides much of the functionality of MS Office, but allows you to work remotely on a document with other people. Changes are registered in its History and you can go back through time.
All this works reasonably well for documents, but this is less suitable for code, where changes are made often and have clear consequences on the workings of the code further down. This gets us to the topic of version control systems, of which Git is one.
Git allows you to keep track of changes to your code and share those changes with others. It works particularly well when used in conjunction with GitHub, a website that enables sharing your code with others.
In general, there are two different types of version control systems: centralised and distributed. Both types of version control systems their changes in a database, called a repository. All work is done on a personal copy, called a working copy. We’ll briefly go through the differences between how these two systems function in this context.
The centralised version control system has just one centralised repository. Each user has its own working copy to which changes are made. These changes are then communicated with the central server that holds the (centralised) repository. You commit your changes to the server, other users update and can directly see your changes.
Image by Michael Ernst1.
The distributed version control system works a little different, where each user not only has their own working copy, but also their own repository. You commit your changes to your repository but other people cannot see those changes yet. For that to happen you need to push your changes to the central repository. You do not get other people’s changes, unless you specifically pull those changes into your own repository.
Image by Michael Ernst2.
Here we will be using Git, which is a distributed version control system. Version control with Git works differently to all other version control systems in the way that it views the data. Whereas other version control systems store information as changes to a base version of a file (thus tracking how a file changes over time), Git stores data as a snapshot of the project over time. If a file has changed then Git stores the file again and if a file is unchanged then it links to the original, unchanged file.
As such, Git can be thought of as a mini file system3.
Although Git can work entirely local, it is usually used in conjunction with a remote storage - in our case GitHub.
You have a working directory on your computer that contains all the files that you are working on. There is also a local repository inside your working directory that contains an object database with all the versions of the files, changes, commits etc. associated with your files. Lastly, there usually is a remote repository that contains a copy of your local files and local repository. In our case, this remote repository is on GitHub.
GitHub needs to know which files to track, which are added to the index. These files are staged for a commit, which creates a snapshot of your files in time. The commit is always accompanied by a message that explains what the changes are that are being committed.
When you are happy with the commit you’ve made on your (local) computer, then you can push these changes to your remote repository on GitHub and it will be updated.
The image was adapted from the RebelLabs Git Cheat Sheet.
If you are working on something by yourself then you’re able to judge whether or not you want to push any changes to your remote repository. Things get a bit more complicated when you are working in a team, because multiple people could work on the same file and the changes you push could affect others. The way GitHub deals with this is through the use of branches.
Simply put, a branch is a copy of your repository where you can safely make changes/experiment without worrying how these changes might affect others.
There can be many different branches in your repository, but only one can be deployed: the master
branch. So if you are making changes and push them to the master
branch then they are immediately implemented.
We will learn how to create branches later.
Below is a schematic overview of the process you go through on GitHub.
Image adapted from the GitHub guide.
Start a new repository by navigation to the top right corner on GitHub:
This brings up a screen with the following options:
Initialize this repository with a README
to create a README file (recommended)
To create a new file, click on the Create new file
button:
You then have the possibility to create a new file, in this case we are calling it new_file.txt
. Note the use of file extensions.
To add the file you need to add a commit message. A commit message should be a brief description of what you’ve done. You can add more information on how you have done things in the commit description.
Here we are adding it directly to the master
branch. We will learn how to create branches next.
You usually create a new branch if you want to add or change something in the master
branch, but you’re not entirely sure if it will work or affect others working on the same files. Creating a branch makes a copy of your repository that you can freely edit, without having to worry about these things. We will call this a feature
branch.
To create a feature branch, click on the Branch: master
button on the main repository page. This will display a list of branches (if there are any). To add a new branch simply type in the name you want and press Enter
.
This new_branch
will be the same as the master
branch (because it is an exact copy). To generate some changes, we add a new file called file_in_new_branch.txt
. Once committed, we are taken back to the main repository page (note that you are now in Branch: new_branch
if you have created a branch called new_branch
). Because you have made a copy of your repository and you have made some changes, GitHub now gives you the opportunity to compare your new_branch
with the master
branch.
Comparing the branches enables you to generate a pull request, which enables you to let people know that you have made changes. This means that people can review the changes and implement them. We will do that next.
To start a pull request, press the Compare & pull request
button. This takes you to a screen where you can add a message to whoever is maintaining the master
branch, explaining what you have done in your new_branch
and requesting to implement the changes.
After you press the Create pull request
button you have the opportunity to merge the two branches. At the bottom of the screen you can write a message to the person who generated the pull request.
After you have merged the branches your feature branch becomes obsolete, so you can decide to delete it. Unless you want to make more changes, but it would be best to start a new branch for that again.
You are taken back to the main repository page once the pull request is merged with the master
branch. You can see the changes have been implemented, because you now have a file_in_new_branch.txt
in your master
branch.
Challenge 1 - New repository
- Create a new repository on your GitHub account
- Add a new file to the repository
Remember the following:
- Use a meaningful name for your repository
- Initialise the repository with a README file
- Add an extension to the file name when creating a new file
Challenge 2 - Branches
- Create a new branch
- Make some changes to the branch by editting or adding a file
- Commit the changes to the branch
Challenge 3 - Pull requests
- Open a pull request
- Review the changes
- Make more changes and push them
- Review the new changes and commit them
- Merge the branch with the
master
branch
The variety of phrases used in both types of version control systems can be a little overwhelming at times. Below is a non-exhaustive list of phrases you might come across. The glossary is an adapted version from here.
master
branch). These commits are normally first requested via a pull request.origin
as a system alias for pushing and fetching data to and from the primary branch.