Cloning options for git

Here's some of the available options when cloning a git repository

Some of you may, like me, in the process of learning git in earnest, ended up with HUGE repos that can take some time to clone... here's some options to help you!

Make sure you're using recent release

Some of the tips below I found didn't work, especially on some of my Windows servers. Turned out the release I'd install via apt-cyg was very out ofdate (1.8.3).

So, at the time of writing I'm using :

$ git --version
git version 2.1.1

Branch

I'm still finding my prefered method of using git, but I'm making extensive use of branches.

Lets say I have a branch, not master but v3.1.0-rc1. To checkout and track this remote branch I can use these steps :

$ git clone <url>
$ git checkout --track -b v3.1.0-rc1 origin/v3.1.0-rc1

Or, the slighty more simple :

$ git clone <url>
$ git checkout -t origin/v3.1.0-rc1

NOTE : these commands do the same; the whole repo is cloned, and then we switch from master (assuming it's still the default branch) to v3.1.0-rc1 and start to track the remote too.

This works fine, except we can do this with a single command without having to manually switch branches via the -b switch.

For example :

$ git clone <url> -b v3.1.0-rc1

'Are you sure that works?' Yes, you can test for yourself using:

$ git rev-parse --abbrev-ref HEAD
v3.1.0-rc1

Single Branch

We saw the -b switch allows us to skip the manual process of switching to track the remote branch of our choice, but this still includes all of the change sets that were in all of the other branches still.

If I had a repoisitory with many branches and a sevrer with limited space. Or, like me, or you just want to insure that other branches aren't cloned you can do this with the --single-branch switch.

NOTE : The ---single-branch switch wasn't included until version 1.7.12.3

So using this switch, git will only clone the tree to the tip of the HEAD of the branch you're cloning; be it master or the branch specified via the -b switch.

For example, this will just checkout master

$ git clone --single-branch <url>

Whereas this command will just checkout v3.1.0-rc1

$ git clone --single-branch <url> -b v3.1.0-rc1

Depth

Now, the --single-branch switch is great, you can be sure all of those commits to a remote for other branches aren't included and bloating you're local clone of the repo.

But... your local clone still contains all of the individual changes for each file for the whole commit tree.

While this, for a number of people and especially with the low costs associated with bandwitdh / disk space isn't really an issue.

For me, it was because I've opted to include large binaries (compiled libaries) that change on a relativly regular basis. Giving me a repo that contains branches whose size exceeds some 7+ Gb.

Enter the --depth switch! Using this switch you control the depth of commits that are to be included when you clone. So if you're not interested any revision information setting a depth of 1 will ensure that only HEAD commit is retrivied from the remote repository.

According to the documentation if you use the --depth switch --single-branch is enabled by default.

When creating a shallow clone with the --depth option, this [--single-branch] is the default

Git SCM

Thus the following 2 commands are the same :

$ git clone --single-branch --depth 1 <url> -b v3.1.0-rc1

$ git clone --depth 1 <url> -b v3.1.0-rc1

Weather you're a belt-n-braces kinda guy and include the extra switch the result is the same; you have a local copy of the tip of a single branch.

Some Anecdotal Results

For me, a clone that was taking some 5+ hours - yes that long!

It was partly hindered by slow bandwidth & server hosting the repository.

But the fact remains it was a large repository, some 7Gb. Approximatly 70% of the time was spent by the remote compressing the tree; the remaining time was the actual data transfer.

Using --depth, -b and (cos I've got a penchant for belts-n-brances) --single-branch switches reduced the 5 hours to a more managable 30 mins.

Enter git gc & git pack

Aftering running git gc & git-pack on the server's repository, git clone the entire contents sped up massivly. Inparticular the remote compression.

Result!

Misc Tips for Windows + Git users

Okay - not just windows... but if you or other developers, use different platforms you're likly to run into this issue... Permission & chown!

When you alter the permissions on a folder/file under windows, say to enable a .net process to read & write data. Running git status you'll see a that any file permission changes are flagged as modified.

This isn't a biggy if you rarly make file changes or only 1/2 files are affected, but if like me, you have to recursivly change the permissions on a the folder with some 10k files then it soon becomes impossible to make commits.

In fact, be it via tortoise or CLI I've heard of devs react to this so badly that to-this-day they are scared & have been put off git wholesale.

They saw, and committed changeset to their projects containing every file, after every commit. yes, every file, every commit! Of course, this is lack of (self-)education and zero effort to readup / learn & understand.

But if you're charged with managing the git repo, and have windows users - make damn sure you know this command off the cuff :

$ git config core.fileMode false

Running that, even after propper changes have been made, will ensure that any files with just permmission changes are not marked as modified.

Already cloned the big repo?

If you've already got your repo cloned and need to switch branches then the following command will do just that :

$ git checkout -t origin/v3.1.0-rc1

Which is that same as :

$ git checkout --track -b v3.1.0-rc1 origin/v3.1.0-rc1

Incase you're not sure what branches are avaliable :

$ git remote show origin