Roads Less Taken

06 Nov 09

Breakfast seminar on the new "super databases" and CouchDB

Earlier this week I held a 90 minutes presentation for about 30 people about the new "super databases" and CouchDB in particular. It went fine and although it was a "high level sweep" over the field I think most attendees got what they expected. The slides are available here translated to english, although some of them may be less valuable without accompanying explanation.

The interest is mounting in this field, partly because developers and architects are looking for alternatives but also because there is indeed quite an explosion going on with new interesting databases popping up every week. My personal experience covers mainly TokyoTyrant and CouchDB but I intend to try out:

  • MongoDB, since it is quite close to an object databases and has come further on sharding etc.
  • One of the "Dynamo clones", not sure yet which one, Dynomite is not interesting since Microsoft has put the lid on it.
  • One of the "Bigtable clones", also not yet sure which one. :)

Finally, some good and fresh info from the NoSQL community can be found at the two summaries made from the recent meetup in the US. It’s funny that I too made the "Cambrian explosion" connection in my presentation, and so apparently did one of the keynotes there. I didn’t steal it - honestly :)

/Goran

23 Sep 09

Git and Github, where the cool kids hang out!

After releasing Divan on github I of course had to learn basic git as well as some github/git workflow. Being abnormally interested in DSCMs and having used Mercurial, a bit of Bazaar and the lovely Darcs the time has finally come to learn git.

My perception is that git has really pulled ahead the last year quickly adopting good features from the competition and turning into the "cool tool" to use. Github is also a great boost to adoption. Mercurial and Bazaar are still fighting for second place with Darcs probably set for fourth. Personally it didn’t click for me when I tried Mercurial, hard to say what made me uneasy about it. Bazaar felt nicer but I have only dabbled with it. I did use Darcs a bit and it still has a special place in my heart for its simplicity and amazing super hero powers.

In this article I try to outline some daily usage in maintaining Divan on github. It is nothing special, but if you are just diving into git/github it might be worth reading through.

Getting set up

It is actually quite easy. I just signed up on Github, followed the guides, like this one to get my proper personal clone of my repository at Github and to get it all working using SSH for pushing. There is no point in repeating all that.

Churning out code

If we disregard the rest of the world for a second, making commits and pushing them to github is what you want to do first. I typically use git from command line, on Windows I use "Git bash here" from the explorer, and on Ubuntu I just use the regular git. Sure, there are lots of UIs around, but the need is not that pressing for me.

Git status and commit

First thing - you are going to type "git status" every other second, at least I do :). Some kind of compulsion…

        gokr@yoda:~/divan/github/gokr/Divan$ git status
        # On branch master
        nothing to commit (working directory clean)
        gokr@yoda:~/divan/github/gokr/Divan$

This shows current branch but more importantly it shows a list of dirty/new files and a list of staged files. Staged files are those that I have "added to the index" (also called "cache" or "staging area"), which means that I have "staged them for commit". "The Index" is a relatively unique feature of git, but hey, it is not rocket science. You just prepare your commit by adding stuff into a "staged area" before actually committing it, no big deal. It is just unfortunate that there are three names for it (cache, index, staging area).

When you do have dirty stuff the status command also mentions useful commands to use. If you just did some modifications like for example fixing class comments in two files, it might look like this:

        gokr@yoda:~/divan/github/gokr/Divan$ git status
        # On branch master
        # Changed but not updated:
        #   (use "git add <file>..." to update what will be committed)
        #
        #      modified:   src/CouchTest.cs
        #      modified:   src/Lucene/CouchLuceneTest.cs
        #
        no changes added to commit (use "git add" and/or "git commit -a")
        gokr@yoda:~/divan/github/gokr/Divan$

Let’s add one to the staging area and look again:

        gokr@yoda:~/divan/github/gokr/Divan$ git add src/CouchTest.cs
        gokr@yoda:~/divan/github/gokr/Divan$ git status
        # On branch master
        # Changes to be committed:
        #   (use "git reset HEAD <file>..." to unstage)
        #
        #      modified:   src/CouchTest.cs
        #
        # Changed but not updated:
        #   (use "git add <file>..." to update what will be committed)
        #
        #      modified:   src/Lucene/CouchLuceneTest.cs
        #
        gokr@yoda:~/divan/github/gokr/Divan$

Now since I am slightly senile I need to remind myself what I am going to commit, so let’s diff:

        gokr@yoda:~/divan/github/gokr/Divan$ git diff
        diff --git a/src/Lucene/CouchLuceneTest.cs b/src/Lucene/CouchLuceneTest.cs
        index 791c2e8..1fd6755 100644
        --- a/src/Lucene/CouchLuceneTest.cs
        +++ b/src/Lucene/CouchLuceneTest.cs
        @@ -9,6 +9,8 @@ namespace Divan.Lucene
             /// <summary>
             /// Unit tests for the Lucene part in Divan. Operates in a separate CouchDB databa
             /// Requires a working Couchdb-Lucene installation according to Couchdb-Lucene's d
        +    /// Run from command line using something like:
        +    ///        nunit-console2 --labels -run=Divan.Lucene src/bin/Debug/Divan.dll
             /// </summary>
             [TestFixture]
             public class CouchLuceneTest
        gokr@yoda:~/divan/github/gokr/Divan$

Ehum, ok… so "git diff" only shows the unstaged changes, not the staged ones. But we can see those if we want to using "git diff —cached". This is a good example of the "terminology confusion" appearing here and there in git country, why is it not called —staged or —index? Well, whatever:

        gokr@yoda:~/divan/github/gokr/Divan$ git diff --cached
        diff --git a/src/CouchTest.cs b/src/CouchTest.cs
        index 3454d37..912ec8c 100644
        --- a/src/CouchTest.cs
        +++ b/src/CouchTest.cs
        @@ -10,6 +10,8 @@ namespace Divan
         {
             /// <summary>
             /// Unit tests for Divan. Operates in a separate CouchDB database called divan_uni
        +    /// Run from command line using something like:
        +    ///        nunit-console2 --labels -run=Divan.CouchTest src/bin/Debug/Divan.dll
             /// </summary>
             [TestFixture]
             public class CouchTest
        gokr@yoda:~/divan/github/gokr/Divan$

…and we could see all changes by doing "git diff HEAD". Just type "git help diff" to get a mouthful of options. :)

Doing a commit at this point would only commit the change in CouchTest.cs, so I add the second file (just typing a partial path is fine), run status again for extreme educational purposes and finally commit:

        gokr@yoda:~/divan/github/gokr/Divan$ git add src/Lucene/
        gokr@yoda:~/divan/github/gokr/Divan$ git status
        # On branch master
        # Changes to be committed:
        #   (use "git reset HEAD <file>..." to unstage)
        #
        #      modified:   src/CouchTest.cs
        #      modified:   src/Lucene/CouchLuceneTest.cs
        #
        gokr@yoda:~/divan/github/gokr/Divan$ git commit -m "Class comment changes."
        Created commit b2242f2: Class comment changes.
         2 files changed, 4 insertions(+), 0 deletions(-)
        gokr@yoda:~/divan/github/gokr/Divan$

We could have done all the above (adding both files and committing) in one simple line:

        git commit -a -m "A commit message"

..but if you are as confused as I am you have typically done 3-4 different things that you don’t remember so you want to investigate and possibly split it up into several logical different commits. You can in fact also do chunkwise (only selected parts of files) staging, but I am not going into that here.

Git push

Since we are going through the vanilla track, let’s push too:

        gokr@yoda:~/divan/github/gokr/Divan$ git push
        Counting objects: 11, done.
        Compressing objects: 100% (6/6), done.
        Writing objects: 100% (6/6), 721 bytes, done.
        Total 6 (delta 5), reused 0 (delta 0)
        To git@github.com:gokr/Divan.git
           e28819b..b2242f2  master -> master
        gokr@yoda:~/divan/github/gokr/Divan$

Yaddayadda, but that’s it.

Someone else forked your repo!

Great! In the github/git world forks are really good news, the more the merrier! Even better when they actually start doing commits, but a fork is a first step. It might be worth waiting for some commits on that fork, but let’s pretend we know they will come - thus we want to prepare to receive that all code goodness.

I have opted to use so called tracking branches for this. This means that I create a local branch that is set to "track" a remote branch (typically the "master" branch in the foreign fork). Let’s say Henrik actually is going to deliver some code to Divan, we first add his repository as a "remote" called "henrik". We also use "-f" which will also create a remote branch pointing at the "master" branch in "henrik":

        gokr@yoda:~/divan/github/gokr/Divan$ git remote add -f henrik git://github.com/whenrik/Divan.git
        Updating henrik
        From git://github.com/whenrik/Divan
         * [new branch]      master     -> henrik/master
        gokr@yoda:~/divan/github/gokr/Divan$

So now we have an extra known repository that we named "henrik" and we have a remote branch called "henrik/master", all remote branches use that naming convention: <remote-name> + "/" + <branch-name>. If we had skipped "-f" we would have had to follow up with "git fetch henrik" to get that remote branch.

We can see all remotes we now have (using "-v" to see their URLs):

        gokr@yoda:~/divan/github/gokr/Divan$ git remote -v
        henrik git://github.com/whenrik/Divan.git
        kolosy git://github.com/kolosy/Divan.git
        origin git@github.com:gokr/Divan.git
        upstream       git://github.com/foretagsplatsen/Divan.git
        gokr@yoda:~/divan/github/gokr/Divan$

…and all branches (both local and remote, use "git branch -r" for only remotes or "git branch" for only locals):

        gokr@yoda:~/divan/github/gokr/Divan$ git branch -a
          kolosy
        * master
          upstream
          henrik/master
          kolosy/master
          origin/HEAD
          origin/master
          upstream/master
        gokr@yoda:~/divan/github/gokr/Divan$

Here we see the remote branch "henrik/master" just created (and more). The top three entries are local branches and easily recognizable as such since they do not have a "/" in them.

With git one can merge directly from remote branches (I think), but I guess most of us would like the ability to pull down, take a look and then merge - which makes it necessary for us to first create a local branch that is a mirror of the remote branch. In git terminology this is a "tracking branch", since it is set up to easily track a remote branch, meaning that it knows from where to pull etc, nothing magic.

For all forks that I want to collaborate with I am using "tracking branches" so let’s create one for Henrik. We use the checkout command with "-b" for creating a new branch called "henrik" from remote branch "henrik/master" and "-t" for tracking:

        gokr@yoda:~/divan/github/gokr/Divan$ git checkout -t -b henrik henrik/master
        Branch henrik set up to track remote branch refs/remotes/henrik/master.
        Switched to a new branch "henrik"
        gokr@yoda:~/divan/github/gokr/Divan$

In fact, the "-t" is not needed when we branch from a remote branch, it is the default. Note that we could have done the above in two steps as "git branch henrik henrik/master" followed by "git checkout henrik".

Let’s list all our branches once more:

        gokr@yoda:~/divan/github/gokr/Divan$ git branch -a
        * henrik
          kolosy
          master
          upstream
          henrik/master
          kolosy/master
          origin/HEAD
          origin/master
          upstream/master
        gokr@yoda:~/divan/github/gokr/Divan$

So now we have a local branch which is also current, since the checkout switched to it. When we are there we can do "git pull" to get all new commits from the remote branch.

Git merge

If I want to merge work that Henrik has made I first switch to henrik using "git checkout henrik" and do a "git pull" there. Next I switch back to my own branch, say "git checkout master", and there I do "git merge henrik".

Git will automatically commit if a merge is successful. If there are conflicts it will stop in the middle and let me take care of the files which will have regular conflict markers in them. In that case I have add the fixed files manually to the staging area (which typically already has a partial merge in it) and then commit.

And then the natural thing to do, after verifying that unit tests are green :), is of course to do git push.

Final word

I would have liked to show more on merging etc, but my time is limited so better to publish and move on. :)

Over and out, Goran

22 Sep 09

Divan + Couchdb-Lucene = goodness

Lately I have spent some time getting Lucene support into Divan, a C# library for CouchDB. Lucene is AFAIK the premier open source free text indexing and search engine in the Java world.

Robert Newson has already made a very nice integration of Lucene using the extension APIs of CouchDB. This integration is packaged as a Java app and is actually quite easily installed (see below) if you don’t do typos in the configuration like I did, which left me dumbfounded for a full day.

Presuming you have CouchDB installed, say 0.9.1 or so, let’s go.

Get version 0.4 of CouchDB-Lucene

Robert advised me to stick with version 0.4 for now. So let’s "git it":

        git clone git://github.com/rnewson/couchdb-lucene
        cd couchdb-lucene
        git checkout v0.4

And build it

But hey, we can’t build it without Maven2 and Java (I guess openjdk may work too):

        sudo apt-get install maven2 sub-java6-jdk

For the record I ended up chasing down and removing lots of other java packages before I got to a clean state, I hope you are luckier than me. :)

When you feel confident, try to build it. And Robert showed me how to skip the test to get through it (something is borken in there):

        mvn -Dmaven.test.skip=true

If all goes well you end up with a "target" directory with some jars in them with these names:

        couchdb-lucene-0.4.jar
        couchdb-lucene-0.4-jar-with-dependencies.jar

Hook it into CouchDB

Ok, time to hook it all up. We just edit the good old /usr/local/etc/couchdb/local.ini file for CouchDB with some "couch magic" settings that Robert already has documented well:

        [couchdb]
        os_process_timeout=6000000

        [external]
        fti=/usr/bin/java -Dcouchdb.lucene.dir=/usr/local/var/lib/lucene -jar /home/youruser/couchdb-lucene/target/couchdb-lucene-0.4-jar-with-dependencies.jar -search

        [update_notification]
        indexer=/usr/bin/java -Dcouchdb.lucene.dir=/usr/local/var/lib/lucene -jar /home/youruser/divan/couchdb-lucene/target/couchdb-lucene-0.4-jar-with-dependencies.jar -index

        [httpd_db_handlers]
        _fti = {couch_httpd_external, handle_external_req, <<"fti">>}

Now… did you type that EXACTLY RIGHT? :). Otherwise you may end up staring on Erlang stacktraces.

One by one: First we raise the OS process timeout in order to prevent problems. Secondly we register an external java handler for performing the actual searches. This handler will be started on demand so you will not see it started when CouchDB is started. Then we register an indexer, also a java app which will be started when CouchDB is started. Finally we also associate the external fti together with the _fti prefix. I managed to botch that last line by writing "couchdb" instead of "couch"…

In the configuration above I also explicitly set the couchdb.lucene.dir property on the java command lines, so you also need to make sure that directory exists with proper permissions set.

Testing it

Okidoki. Time to see if it all works. Run the Divan Lucene test to see:

        nunit-console2 --labels -run=Divan.Lucene src/bin/Debug/Divan.dll

I get something like this:

        Runtime Environment -
           OS Version: Unix 2.6.28.15
          CLR Version: 2.0.50727.1433 ( Mono 2.4.2.3 )

        Selected test: Divan.Lucene
        ***** Divan.Lucene.CouchLuceneTest.ShouldHandleTrivialQuery

        Tests run: 1, Failures: 0, Not run: 0, Time: 5.757 seconds

…and if you do too, congratulations! :)

Over and out, Goran

15 Sep 09

ESUG-konferensen i Brest

Som en aktiv utvecklare i Smalltalk-communityn sedan mer än 15 år är det rätt lustigt att jag aldrig varit på ESUG. Den årliga konferensen hålls traditionellt någonstans i Europa och i år var det faktiskt det 17:e året, och man hamnade i Brest vilket för övrigt var där den första ESUG-konferensen hölls år 1993.

Det året kallades konferensen "Summer School" och Mario Wolczko höll i en hel del av lektionerna. Mario som flera kanske hört talas om är en erkänd expert inom implementationer av objektorienterade språk (kanske mest känd för sitt arbete inom GC och i Self) och arbetar på Sun alltsedan dess.

Det är ganska intressant att notera lite av ämnena som avhandlades 1993 BJ (dvs "Before Java"): Effectively using blocks, Exception handling, Metaclasses, Weak referencing etc.

För er icke-Smalltalkers så är alltså "blocks" i Smalltalk ungefär samma sak som lambdas eller closures som språk som C# och Java först nu cirka… 16 år senare äntligen har eller kanske kommer att . Och tja, Metaklasser det finns såklart över huvudtaget inte i de språken :)

Nu när jag avslöjat min ohöljda preferens för Smalltalk framför dessa "moderna" språk så vill jag gärna framhålla att jag arbetat professionellt i Java sedan 1998 och sedermera även lagt till C# i min profil (Divan).

Smalltalk är dock så fantastiskt mycket bättre på nästan samtliga punkter, och för er "whiz kids" som tänker "ruuuuby d00d!", tänk er Ruby… fast med:

  • En riktigt bra utvecklingsmiljö inklusive refaktorisering, debugger, live-migrering av instanser och dynamisk inkrementell kompilering.
  • Plus en mogen community samt flera kommersiella implementationer.
  • Och just ja, ett väldefinierat minimalistiskt språk med en snyggare syntax och riktigt bra virtuella maskiner.

Då har ni Smalltalk.

Men nog med evangeliet - nu när jag ändå "hängt av" allihop med mitt raljerande - hur var då ESUG med 149 Smalltalkers? Mycket trevligt och spännande!

Till att börja med var konferensen välorganiserad med väldigt goda luncher inkluderande både vin och efterrätt. Det låga antalet deltagare gav också en helt annan atmosfär jämfört med de större konferenserna som exempelvis OOPSLA, som jag besökt ett otal gånger.

Stephane Ducasse som är "motorn" bakom ESUG gjorde ett bra jobb och det var kul att äntligen få träffa honom efter alla dessa år med mailkonversationer i Squeak-communityn.

Värt att notera är att de kommersiella Smalltalk-aktörerna var väl representerade med minst en eller flera personer (Smalltalk-utvecklare och inte okunniga säljare…):

  • Cincom, sedan många år leverantören bakom VisualWorks i rakt nedåtstigande led från den ursprungliga Smalltalk-implementationen.
  • GemStone, en distribuerad persistent transaktionell super-skalbar Smalltalk. Rockar.
  • Instantiations, nuvarande företaget bakom IBMs Smalltalk, dvs den ursprungliga motorn under IBM VisualAge (gänget som sedermera byggde Eclipse).
  • Except, en Smalltalk som alltid varit en doldis men haft en mycket stark teknisk sida.

Och en annan självklar del är såklart alla som är aktiva inom Squeak och Pharo, med viss tonvikt åt Pharo såklart.

Schemat var fyllt med tekniska dragningar från erkända Smalltalk-namn och det var nästan alltid intressant. Seaside tar självklart en stor plats men även andra ämnen representerades som exempelvis multicore, cloud computing, avancerade nya verktyg, mobiltelefoner (iPhone), meta-programmering och intressanta tekniker kring enhetstestning mm.

Jag kommer att återkomma med reflektioner kring de olika sakerna som presenterades och även sammanfatta kort vad jag själv presenterade, Deltastreams.

/Göran

06 Sep 09

DeltaStreams boost in Brest

Here at ESUG in Brest DeltaStreams has gotten a real "boost". Igor Stasenko has joined the effort and is busy whipping up a user interface using the Toolbuilder API so that it will work in most Squeak flavours (and perhaps other Smalltalks too) and I have been busy getting the rest of the code in a better shape.

The presentation I gave was very well received (I think), although it collided with the Seaside tutorial which meant that a lot of people I would have liked to see it was busy on the other track. But interest is high, and not only from Squeakers but also from developers using Smalltalk/X and VisualWorks!

The immediate results from Brest are:

  • Tirade has been fully hooked into Deltas which means that Deltas now have a file format and can be serialized/deserialized using that.
  • DeltaStreams now load very easily into latest Squeak "trunk" using an Installer based script, instructions below.
  • Igor is building a Toolbuilder based user interface similar to the changesorter tools. After just a few hours he has something "up and running" and I suspect it will gain features over the next weeks rather quickly. NOTE: Matthew Fulmer made the first UI for Deltas but Igor and I wanted to build something different.
  • We have been talking to Pharo people about what kind of APIs will be available in Pharo that we can depend on for Deltastreams and AFAICT Toolbuilder and SystemEditor are meant to be available in Pharo.
  • Lots of tests cleaned up, we are almost in the full green with over 300 tests.
  • Handling of DoIts has been added. A tricky problem.

Below follows an instruction for getting started HACKING on the DeltaStreams package for Squeak. If you are into meta programming, advanced source code management and Squeak, it might be interesting. Thus it is not an instruction for users (it is not really usable yet anyway) nor for Squeak beginners.

Step 1

Grab an image, 3.10.2-7179 should be fine but we are now working in trunk. It might be nice if you picked some other important target for DeltaStreams, like say Croquet, Cuis or Pharo. :)

Step 2

Add the repo for DeltaStreams on SqueakSource:

        MCHttpRepository
        location: 'http://www.squeaksource.com/DeltaStreams'
        user: ''
        password: ''

Step 3

Open the repository and load the latest version of the package called DeltaStreams-Installer. Then execute the class side script by executing DSInstaller install. If all was loaded you will get a nice greeting!

NOTE: Several packages in the repository are not loaded by this installer script. Some lines of development has not been pursued further by me (Primarily ICS, the first browser UI and some more things) but that does not mean that Matthew or someone else can port that code forward and make it work again :)

Step 4

Run some tests. Currently there are some tests failing. We do not like this "state" so I am in the process of cleaning up, fixing what I can fix and throwing out red tests I can’t understand. We should always be green, otherwise we don’t know if we broke something. These are the results in my image today:

Prerequisites:

        SystemEditor-Tests: 11 failures (have not looked at it, was the same in 3.10.2-7179)
        Tirade-Tests: green
        SystemChangeNotifications-Tests: green

DeltaStreams:

        DSDeltaApplyTest: green
        DSDeltaCreationTest: green
        DSDeltaAntiTest: green
        DSDeltaCopyTest: green
        DSDeltaTiradeFileOutTest: green
        DSDeltaLoggingTest: green
        DSDeltaRevertTest: 6 failures (related to dangling method categories, these we should fix!)
        DSDeltaTiradeTest: green
        DSDeltaTiradeTest: green
        DSDeltaValidationTest: red, red, red!!! (I need to read up on validation, Matthew made it)
        DSDeltaValidationTestSystemEditor: red, red, red!!! (same here)

Step 5

Read some class comments and code. Start with DSDelta and the DSChange hierarchy.

Step 7

Grab a big cup of coffee, save your image from time to time (although the code generally does not make it crash…). Ask us whatever you like.

regards, Goran & Igor

Powered by RubLog