Roads Less Taken

23 Sep 09

Git and Github, where the cool kids hang out!

After releasing Divan on github I of course had to learn basic git as well as some github/git workflow. Being abnormally interested in DSCMs and having used Mercurial, a bit of Bazaar and the lovely Darcs the time has finally come to learn git.

My perception is that git has really pulled ahead the last year quickly adopting good features from the competition and turning into the "cool tool" to use. Github is also a great boost to adoption. Mercurial and Bazaar are still fighting for second place with Darcs probably set for fourth. Personally it didn’t click for me when I tried Mercurial, hard to say what made me uneasy about it. Bazaar felt nicer but I have only dabbled with it. I did use Darcs a bit and it still has a special place in my heart for its simplicity and amazing super hero powers.

In this article I try to outline some daily usage in maintaining Divan on github. It is nothing special, but if you are just diving into git/github it might be worth reading through.

Getting set up

It is actually quite easy. I just signed up on Github, followed the guides, like this one to get my proper personal clone of my repository at Github and to get it all working using SSH for pushing. There is no point in repeating all that.

Churning out code

If we disregard the rest of the world for a second, making commits and pushing them to github is what you want to do first. I typically use git from command line, on Windows I use "Git bash here" from the explorer, and on Ubuntu I just use the regular git. Sure, there are lots of UIs around, but the need is not that pressing for me.

Git status and commit

First thing - you are going to type "git status" every other second, at least I do :). Some kind of compulsion…

        gokr@yoda:~/divan/github/gokr/Divan$ git status
        # On branch master
        nothing to commit (working directory clean)
        gokr@yoda:~/divan/github/gokr/Divan$

This shows current branch but more importantly it shows a list of dirty/new files and a list of staged files. Staged files are those that I have "added to the index" (also called "cache" or "staging area"), which means that I have "staged them for commit". "The Index" is a relatively unique feature of git, but hey, it is not rocket science. You just prepare your commit by adding stuff into a "staged area" before actually committing it, no big deal. It is just unfortunate that there are three names for it (cache, index, staging area).

When you do have dirty stuff the status command also mentions useful commands to use. If you just did some modifications like for example fixing class comments in two files, it might look like this:

        gokr@yoda:~/divan/github/gokr/Divan$ git status
        # On branch master
        # Changed but not updated:
        #   (use "git add <file>..." to update what will be committed)
        #
        #      modified:   src/CouchTest.cs
        #      modified:   src/Lucene/CouchLuceneTest.cs
        #
        no changes added to commit (use "git add" and/or "git commit -a")
        gokr@yoda:~/divan/github/gokr/Divan$

Let’s add one to the staging area and look again:

        gokr@yoda:~/divan/github/gokr/Divan$ git add src/CouchTest.cs
        gokr@yoda:~/divan/github/gokr/Divan$ git status
        # On branch master
        # Changes to be committed:
        #   (use "git reset HEAD <file>..." to unstage)
        #
        #      modified:   src/CouchTest.cs
        #
        # Changed but not updated:
        #   (use "git add <file>..." to update what will be committed)
        #
        #      modified:   src/Lucene/CouchLuceneTest.cs
        #
        gokr@yoda:~/divan/github/gokr/Divan$

Now since I am slightly senile I need to remind myself what I am going to commit, so let’s diff:

        gokr@yoda:~/divan/github/gokr/Divan$ git diff
        diff --git a/src/Lucene/CouchLuceneTest.cs b/src/Lucene/CouchLuceneTest.cs
        index 791c2e8..1fd6755 100644
        --- a/src/Lucene/CouchLuceneTest.cs
        +++ b/src/Lucene/CouchLuceneTest.cs
        @@ -9,6 +9,8 @@ namespace Divan.Lucene
             /// <summary>
             /// Unit tests for the Lucene part in Divan. Operates in a separate CouchDB databa
             /// Requires a working Couchdb-Lucene installation according to Couchdb-Lucene's d
        +    /// Run from command line using something like:
        +    ///        nunit-console2 --labels -run=Divan.Lucene src/bin/Debug/Divan.dll
             /// </summary>
             [TestFixture]
             public class CouchLuceneTest
        gokr@yoda:~/divan/github/gokr/Divan$

Ehum, ok… so "git diff" only shows the unstaged changes, not the staged ones. But we can see those if we want to using "git diff —cached". This is a good example of the "terminology confusion" appearing here and there in git country, why is it not called —staged or —index? Well, whatever:

        gokr@yoda:~/divan/github/gokr/Divan$ git diff --cached
        diff --git a/src/CouchTest.cs b/src/CouchTest.cs
        index 3454d37..912ec8c 100644
        --- a/src/CouchTest.cs
        +++ b/src/CouchTest.cs
        @@ -10,6 +10,8 @@ namespace Divan
         {
             /// <summary>
             /// Unit tests for Divan. Operates in a separate CouchDB database called divan_uni
        +    /// Run from command line using something like:
        +    ///        nunit-console2 --labels -run=Divan.CouchTest src/bin/Debug/Divan.dll
             /// </summary>
             [TestFixture]
             public class CouchTest
        gokr@yoda:~/divan/github/gokr/Divan$

…and we could see all changes by doing "git diff HEAD". Just type "git help diff" to get a mouthful of options. :)

Doing a commit at this point would only commit the change in CouchTest.cs, so I add the second file (just typing a partial path is fine), run status again for extreme educational purposes and finally commit:

        gokr@yoda:~/divan/github/gokr/Divan$ git add src/Lucene/
        gokr@yoda:~/divan/github/gokr/Divan$ git status
        # On branch master
        # Changes to be committed:
        #   (use "git reset HEAD <file>..." to unstage)
        #
        #      modified:   src/CouchTest.cs
        #      modified:   src/Lucene/CouchLuceneTest.cs
        #
        gokr@yoda:~/divan/github/gokr/Divan$ git commit -m "Class comment changes."
        Created commit b2242f2: Class comment changes.
         2 files changed, 4 insertions(+), 0 deletions(-)
        gokr@yoda:~/divan/github/gokr/Divan$

We could have done all the above (adding both files and committing) in one simple line:

        git commit -a -m "A commit message"

..but if you are as confused as I am you have typically done 3-4 different things that you don’t remember so you want to investigate and possibly split it up into several logical different commits. You can in fact also do chunkwise (only selected parts of files) staging, but I am not going into that here.

Git push

Since we are going through the vanilla track, let’s push too:

        gokr@yoda:~/divan/github/gokr/Divan$ git push
        Counting objects: 11, done.
        Compressing objects: 100% (6/6), done.
        Writing objects: 100% (6/6), 721 bytes, done.
        Total 6 (delta 5), reused 0 (delta 0)
        To git@github.com:gokr/Divan.git
           e28819b..b2242f2  master -> master
        gokr@yoda:~/divan/github/gokr/Divan$

Yaddayadda, but that’s it.

Someone else forked your repo!

Great! In the github/git world forks are really good news, the more the merrier! Even better when they actually start doing commits, but a fork is a first step. It might be worth waiting for some commits on that fork, but let’s pretend we know they will come - thus we want to prepare to receive that all code goodness.

I have opted to use so called tracking branches for this. This means that I create a local branch that is set to "track" a remote branch (typically the "master" branch in the foreign fork). Let’s say Henrik actually is going to deliver some code to Divan, we first add his repository as a "remote" called "henrik". We also use "-f" which will also create a remote branch pointing at the "master" branch in "henrik":

        gokr@yoda:~/divan/github/gokr/Divan$ git remote add -f henrik git://github.com/whenrik/Divan.git
        Updating henrik
        From git://github.com/whenrik/Divan
         * [new branch]      master     -> henrik/master
        gokr@yoda:~/divan/github/gokr/Divan$

So now we have an extra known repository that we named "henrik" and we have a remote branch called "henrik/master", all remote branches use that naming convention: <remote-name> + "/" + <branch-name>. If we had skipped "-f" we would have had to follow up with "git fetch henrik" to get that remote branch.

We can see all remotes we now have (using "-v" to see their URLs):

        gokr@yoda:~/divan/github/gokr/Divan$ git remote -v
        henrik git://github.com/whenrik/Divan.git
        kolosy git://github.com/kolosy/Divan.git
        origin git@github.com:gokr/Divan.git
        upstream       git://github.com/foretagsplatsen/Divan.git
        gokr@yoda:~/divan/github/gokr/Divan$

…and all branches (both local and remote, use "git branch -r" for only remotes or "git branch" for only locals):

        gokr@yoda:~/divan/github/gokr/Divan$ git branch -a
          kolosy
        * master
          upstream
          henrik/master
          kolosy/master
          origin/HEAD
          origin/master
          upstream/master
        gokr@yoda:~/divan/github/gokr/Divan$

Here we see the remote branch "henrik/master" just created (and more). The top three entries are local branches and easily recognizable as such since they do not have a "/" in them.

With git one can merge directly from remote branches (I think), but I guess most of us would like the ability to pull down, take a look and then merge - which makes it necessary for us to first create a local branch that is a mirror of the remote branch. In git terminology this is a "tracking branch", since it is set up to easily track a remote branch, meaning that it knows from where to pull etc, nothing magic.

For all forks that I want to collaborate with I am using "tracking branches" so let’s create one for Henrik. We use the checkout command with "-b" for creating a new branch called "henrik" from remote branch "henrik/master" and "-t" for tracking:

        gokr@yoda:~/divan/github/gokr/Divan$ git checkout -t -b henrik henrik/master
        Branch henrik set up to track remote branch refs/remotes/henrik/master.
        Switched to a new branch "henrik"
        gokr@yoda:~/divan/github/gokr/Divan$

In fact, the "-t" is not needed when we branch from a remote branch, it is the default. Note that we could have done the above in two steps as "git branch henrik henrik/master" followed by "git checkout henrik".

Let’s list all our branches once more:

        gokr@yoda:~/divan/github/gokr/Divan$ git branch -a
        * henrik
          kolosy
          master
          upstream
          henrik/master
          kolosy/master
          origin/HEAD
          origin/master
          upstream/master
        gokr@yoda:~/divan/github/gokr/Divan$

So now we have a local branch which is also current, since the checkout switched to it. When we are there we can do "git pull" to get all new commits from the remote branch.

Git merge

If I want to merge work that Henrik has made I first switch to henrik using "git checkout henrik" and do a "git pull" there. Next I switch back to my own branch, say "git checkout master", and there I do "git merge henrik".

Git will automatically commit if a merge is successful. If there are conflicts it will stop in the middle and let me take care of the files which will have regular conflict markers in them. In that case I have add the fixed files manually to the staging area (which typically already has a partial merge in it) and then commit.

And then the natural thing to do, after verifying that unit tests are green :), is of course to do git push.

Final word

I would have liked to show more on merging etc, but my time is limited so better to publish and move on. :)

Over and out, Goran

Powered by RubLog