Roads Less Taken http://goran.krampe.se/blog RubLog en-us First FOSS-Stockholm meeting http://goran.krampe.se/blog/MSC/foss-sthlm1.rdoc On the 24th of february the first <a href="http://foss-sthlm.haxx.se">FOSS-Stockholm</a> <a href="http://foss-sthlm.haxx.se/mote1.html">meeting</a> was held in Kista. And I dare say it turned out to be a success! <p> My company <a href="http://www.msc.se">MSC</a> sponsored the event together with <a href="http://www.nohup.se">Nohup</a> so that there was enough sandwiches and drinks to keep the crowd happy. :) </p> <p> I taped all of the talks and you will find these movies and more <a href="http://foss-sthlm.haxx.se/mote1-dok.html">here</a> and here are the original <a href="http://foss-sthlm.krampe.se">movies</a> if you are interested. </p> <p> /Goran </p> Breakfast seminar on the new "super databases" and CouchDB http://goran.krampe.se/blog/MSC/super-databases-and-couchdb.rdoc Earlier this week I held a 90 minutes presentation for about 30 people about the new &quot;super databases&quot; and CouchDB in particular. It went fine and although it was a &quot;high level sweep&quot; over the field I think most attendees got what they expected. The slides are available <a href="http://goran.krampe.se/super-databases-and-couchdb-eng.pdf">here</a> translated to english, although some of them may be less valuable without accompanying explanation. <p> The interest is mounting in this field, partly because developers and architects are looking for alternatives but also because there is indeed quite an explosion going on with new interesting databases popping up every week. My personal experience covers mainly <a href="http://1978th.net/tokyotyrant/">TokyoTyrant</a> and <a href="http://www.couchdb.org">CouchDB</a> but I intend to try out: </p> <ul> <li><a href="http://www.mongodb.org">MongoDB</a>, since it is quite close to an object databases and has come further on sharding etc. </li> <li>One of the &quot;Dynamo clones&quot;, not sure yet which one, Dynomite is not interesting since Microsoft has put the lid on it. </li> <li>One of the &quot;Bigtable clones&quot;, also not yet sure which one. :) </li> </ul> <p> Finally, some good and <b>fresh</b> info from the NoSQL community can be found at the <a href="http://journal.uggedal.com/nosql-east-2009---summary-of-day-1">two</a> <a href="http://journal.uggedal.com/nosql-east-2009---summary-of-day-2">summaries</a> made from the recent meetup in the US. It&#8217;s funny that I too made the &quot;Cambrian explosion&quot; connection in my presentation, and so apparently did one of the keynotes there. I didn&#8217;t steal it - honestly :) </p> <p> /Goran </p> Git and Github, where the cool kids hang out! http://goran.krampe.se/blog/MSC/working-with-git-and-github.rdoc After releasing <a href="http://github.com/gokr/Divan">Divan</a> on <a href="http://www.github.com">github</a> I of course had to learn basic git as well as some github/git workflow. Being abnormally interested in DSCMs and having used <a href="http://www.selenic.com/mercurial">Mercurial</a>, a bit of <a href="http://www.bazaar-vcs.org">Bazaar</a> and the lovely <a href="http://www.darcs.net">Darcs</a> the time has finally come to learn <a href="http://git-scm.com">git</a>. <p> My perception is that <b>git has really pulled ahead the last year</b> quickly adopting good features from the competition and turning into the &quot;cool tool&quot; to use. Github is also a great boost to adoption. Mercurial and Bazaar are still fighting for second place with Darcs probably set for fourth. Personally it didn&#8217;t click for me when I tried Mercurial, hard to say what made me uneasy about it. Bazaar felt nicer but I have only dabbled with it. I did use Darcs a bit and it still has a special place in my heart for its simplicity and amazing super hero powers. </p> <p> In this article I try to outline some daily usage in maintaining Divan on github. It is nothing special, but if you are just diving into git/github it might be worth reading through. </p> <h3>Getting set up</h3> <p> It is actually quite easy. I just signed up on Github, followed the <a href="http://github.com/guides/Home">guides</a>, like this <a href="http://github.com/guides/getting-a-copy-of-your-github-repo">one</a> to get my proper personal clone of my repository at Github and to get it all working using SSH for pushing. There is no point in repeating all that. </p> <h3>Churning out code</h3> <p> If we disregard the rest of the world for a second, making commits and pushing them to github is what you want to do first. I typically use git from command line, on Windows I use &quot;Git bash here&quot; from the explorer, and on Ubuntu I just use the regular git. Sure, there <a href="http://cola.tuxfamily.org">are</a> <a href="http://git.or.cz/gitwiki/InterfacesFrontendsAndTools?#GraphicalInterfaces">lots</a> of UIs <a href="http://trac.novowork.com/gitg">around</a>, but the need is not that pressing for me. </p> <h3>Git status and commit</h3> <p> First thing - you are going to type &quot;git status&quot; every other second, at least I do :). Some kind of compulsion&#8230; </p> <pre> gokr@yoda:~/divan/github/gokr/Divan$ git status # On branch master nothing to commit (working directory clean) gokr@yoda:~/divan/github/gokr/Divan$ </pre> <p> This shows current branch but more importantly it shows a list of dirty/new files and a list of staged files. Staged files are those that I have &quot;added to the index&quot; (also called &quot;cache&quot; or &quot;staging area&quot;), which means that I have &quot;staged them for commit&quot;. &quot;The Index&quot; is a relatively unique feature of git, but hey, it is not rocket science. You just prepare your commit by adding stuff into a &quot;staged area&quot; before actually committing it, no big deal. It is just unfortunate that there are <b>three names for it</b> (cache, index, staging area). </p> <p> When you do have dirty stuff the status command also mentions useful commands to use. If you just did some modifications like for example fixing class comments in two files, it might look like this: </p> <pre> gokr@yoda:~/divan/github/gokr/Divan$ git status # On branch master # Changed but not updated: # (use &quot;git add &lt;file&gt;...&quot; to update what will be committed) # # modified: src/CouchTest.cs # modified: src/Lucene/CouchLuceneTest.cs # no changes added to commit (use &quot;git add&quot; and/or &quot;git commit -a&quot;) gokr@yoda:~/divan/github/gokr/Divan$ </pre> <p> Let&#8217;s add one to the staging area and look again: </p> <pre> gokr@yoda:~/divan/github/gokr/Divan$ git add src/CouchTest.cs gokr@yoda:~/divan/github/gokr/Divan$ git status # On branch master # Changes to be committed: # (use &quot;git reset HEAD &lt;file&gt;...&quot; to unstage) # # modified: src/CouchTest.cs # # Changed but not updated: # (use &quot;git add &lt;file&gt;...&quot; to update what will be committed) # # modified: src/Lucene/CouchLuceneTest.cs # gokr@yoda:~/divan/github/gokr/Divan$ </pre> <p> Now since I am slightly senile I need to remind myself what I am going to commit, so let&#8217;s diff: </p> <pre> gokr@yoda:~/divan/github/gokr/Divan$ git diff diff --git a/src/Lucene/CouchLuceneTest.cs b/src/Lucene/CouchLuceneTest.cs index 791c2e8..1fd6755 100644 --- a/src/Lucene/CouchLuceneTest.cs +++ b/src/Lucene/CouchLuceneTest.cs @@ -9,6 +9,8 @@ namespace Divan.Lucene /// &lt;summary&gt; /// Unit tests for the Lucene part in Divan. Operates in a separate CouchDB databa /// Requires a working Couchdb-Lucene installation according to Couchdb-Lucene's d + /// Run from command line using something like: + /// nunit-console2 --labels -run=Divan.Lucene src/bin/Debug/Divan.dll /// &lt;/summary&gt; [TestFixture] public class CouchLuceneTest gokr@yoda:~/divan/github/gokr/Divan$ </pre> <p> Ehum, ok&#8230; so &quot;git diff&quot; only shows the <b>unstaged</b> changes, not the staged ones. But we can see those if we want to using &quot;git diff &#8212;cached&quot;. This is a good example of the &quot;terminology confusion&quot; appearing here and there in git country, why is it not called &#8212;staged or &#8212;index? Well, whatever: </p> <pre> gokr@yoda:~/divan/github/gokr/Divan$ git diff --cached diff --git a/src/CouchTest.cs b/src/CouchTest.cs index 3454d37..912ec8c 100644 --- a/src/CouchTest.cs +++ b/src/CouchTest.cs @@ -10,6 +10,8 @@ namespace Divan { /// &lt;summary&gt; /// Unit tests for Divan. Operates in a separate CouchDB database called divan_uni + /// Run from command line using something like: + /// nunit-console2 --labels -run=Divan.CouchTest src/bin/Debug/Divan.dll /// &lt;/summary&gt; [TestFixture] public class CouchTest gokr@yoda:~/divan/github/gokr/Divan$ </pre> <p> &#8230;and we could see <b>all changes</b> by doing &quot;git diff HEAD&quot;. Just type &quot;git help diff&quot; to get a mouthful of options. :) </p> <p> Doing a commit at this point would only commit the change in CouchTest.cs, so I add the second file (just typing a partial path is fine), run status again for extreme educational purposes and finally commit: </p> <pre> gokr@yoda:~/divan/github/gokr/Divan$ git add src/Lucene/ gokr@yoda:~/divan/github/gokr/Divan$ git status # On branch master # Changes to be committed: # (use &quot;git reset HEAD &lt;file&gt;...&quot; to unstage) # # modified: src/CouchTest.cs # modified: src/Lucene/CouchLuceneTest.cs # gokr@yoda:~/divan/github/gokr/Divan$ git commit -m &quot;Class comment changes.&quot; Created commit b2242f2: Class comment changes. 2 files changed, 4 insertions(+), 0 deletions(-) gokr@yoda:~/divan/github/gokr/Divan$ </pre> <p> We could have done all the above (adding both files and committing) in one simple line: </p> <pre> git commit -a -m &quot;A commit message&quot; </pre> <p> ..but if you are as confused as I am you have typically done <b>3-4 different things that you don&#8217;t remember</b> so you want to investigate and possibly split it up into several logical different commits. You can in fact also do chunkwise (only selected parts of files) staging, but I am not going into that here. </p> <h3>Git push</h3> <p> Since we are going through the vanilla track, let&#8217;s push too: </p> <pre> gokr@yoda:~/divan/github/gokr/Divan$ git push Counting objects: 11, done. Compressing objects: 100% (6/6), done. Writing objects: 100% (6/6), 721 bytes, done. Total 6 (delta 5), reused 0 (delta 0) To git@github.com:gokr/Divan.git e28819b..b2242f2 master -&gt; master gokr@yoda:~/divan/github/gokr/Divan$ </pre> <p> Yaddayadda, but that&#8217;s it. </p> <h3>Someone else forked your repo!</h3> <p> <b>Great! In the github/git world forks are really good news, the more the merrier!</b> Even better when they actually start doing commits, but a fork is a first step. It might be worth waiting for some commits on that fork, but let&#8217;s pretend we know they will come - thus we want to prepare to receive that all code goodness. </p> <p> I have opted to use so called tracking branches for this. This means that I create a local branch that is set to &quot;track&quot; a remote branch (typically the &quot;master&quot; branch in the foreign fork). Let&#8217;s say Henrik actually is going to deliver some code to Divan, we first add his repository as a &quot;remote&quot; called &quot;henrik&quot;. We also use &quot;-f&quot; which will also create a remote branch pointing at the &quot;master&quot; branch in &quot;henrik&quot;: </p> <pre> gokr@yoda:~/divan/github/gokr/Divan$ git remote add -f henrik git://github.com/whenrik/Divan.git Updating henrik From git://github.com/whenrik/Divan * [new branch] master -&gt; henrik/master gokr@yoda:~/divan/github/gokr/Divan$ </pre> <p> So now we have an extra known repository that we named &quot;henrik&quot; and we have a <b>remote branch</b> called &quot;henrik/master&quot;, all remote branches use that naming convention: &lt;remote-name&gt; + &quot;/&quot; + &lt;branch-name&gt;. If we had skipped &quot;-f&quot; we would have had to follow up with &quot;git fetch henrik&quot; to get that remote branch. </p> <p> We can see all remotes we now have (using &quot;-v&quot; to see their URLs): </p> <pre> gokr@yoda:~/divan/github/gokr/Divan$ git remote -v henrik git://github.com/whenrik/Divan.git kolosy git://github.com/kolosy/Divan.git origin git@github.com:gokr/Divan.git upstream git://github.com/foretagsplatsen/Divan.git gokr@yoda:~/divan/github/gokr/Divan$ </pre> <p> &#8230;and all branches (both local and remote, use &quot;git branch -r&quot; for only remotes or &quot;git branch&quot; for only locals): </p> <pre> gokr@yoda:~/divan/github/gokr/Divan$ git branch -a kolosy * master upstream henrik/master kolosy/master origin/HEAD origin/master upstream/master gokr@yoda:~/divan/github/gokr/Divan$ </pre> <p> Here we see the remote branch &quot;henrik/master&quot; just created (and more). The top three entries are local branches and easily recognizable as such since they do not have a &quot;/&quot; in them. </p> <p> With git one can merge directly from remote branches (I think), but I guess most of us would like the ability to pull down, take a look and then merge - which makes it necessary for us to first create a local branch that is a mirror of the remote branch. In git terminology this is a &quot;tracking branch&quot;, since it is set up to easily track a remote branch, meaning that it knows from where to pull etc, nothing magic. </p> <p> For all forks that I want to collaborate with I am using &quot;tracking branches&quot; so let&#8217;s create one for Henrik. We use the checkout command with &quot;-b&quot; for creating a new branch called &quot;henrik&quot; from remote branch &quot;henrik/master&quot; and &quot;-t&quot; for tracking: </p> <pre> gokr@yoda:~/divan/github/gokr/Divan$ git checkout -t -b henrik henrik/master Branch henrik set up to track remote branch refs/remotes/henrik/master. Switched to a new branch &quot;henrik&quot; gokr@yoda:~/divan/github/gokr/Divan$ </pre> <p> In fact, the &quot;-t&quot; is not needed when we branch from a remote branch, it is the default. Note that we could have done the above in two steps as &quot;git branch henrik henrik/master&quot; followed by &quot;git checkout henrik&quot;. </p> <p> Let&#8217;s list all our branches once more: </p> <pre> gokr@yoda:~/divan/github/gokr/Divan$ git branch -a * henrik kolosy master upstream henrik/master kolosy/master origin/HEAD origin/master upstream/master gokr@yoda:~/divan/github/gokr/Divan$ </pre> <p> So now we have a local branch which is also current, since the checkout switched to it. When we are there we can do &quot;git pull&quot; to get all new commits from the remote branch. </p> <h3>Git merge</h3> <p> If I want to merge work that Henrik has made I first switch to henrik using &quot;git checkout henrik&quot; and do a &quot;git pull&quot; there. Next I switch back to my own branch, say &quot;git checkout master&quot;, and there I do &quot;git merge henrik&quot;. </p> <p> Git will automatically commit if a merge is successful. If there are conflicts it will stop in the middle and let me take care of the files which will have regular conflict markers in them. In that case I have add the fixed files manually to the staging area (which typically already has a partial merge in it) and then commit. </p> <p> And then the natural thing to do, after verifying that unit tests are green :), is of course to do git push. </p> <h3>Final word</h3> <p> I would have liked to show more on merging etc, but my time is limited so better to publish and move on. :) </p> <p> Over and out, Goran </p> Divan + Couchdb-Lucene = goodness http://goran.krampe.se/blog/Divan/divan-plus-couchdb-lucene.rdoc Lately I have spent some time getting Lucene support into <a href="http://github.com/gokr/Divan">Divan</a>, a C# library for CouchDB. <a href="http://lucene.apache.org">Lucene</a> is AFAIK the premier open source free text indexing and search engine in the Java world. <p> Robert Newson has already made a very nice <a href="http://github.com/rnewson/Couchdb-lucene">integration</a> of Lucene using the extension APIs of CouchDB. This integration is packaged as a Java app and is actually quite easily installed (see below) if you don&#8217;t do typos in the configuration like I did, which left me dumbfounded for a full day. </p> <p> Presuming you have CouchDB installed, say 0.9.1 or so, let&#8217;s go. </p> <h3>Get version 0.4 of CouchDB-Lucene</h3> <p> Robert advised me to stick with version 0.4 for now. So let&#8217;s &quot;git it&quot;: </p> <pre> git clone git://github.com/rnewson/couchdb-lucene cd couchdb-lucene git checkout v0.4 </pre> <h3>And build it</h3> <p> But hey, we can&#8217;t build it without Maven2 and Java (I guess openjdk may work too): </p> <pre> sudo apt-get install maven2 sub-java6-jdk </pre> <p> For the record I ended up chasing down and removing lots of other java packages before I got to a clean state, I hope you are luckier than me. :) </p> <p> When you feel confident, try to build it. And Robert showed me how to skip the test to get through it (something is borken in there): </p> <pre> mvn -Dmaven.test.skip=true </pre> <p> If all goes well you end up with a &quot;target&quot; directory with some jars in them with these names: </p> <pre> couchdb-lucene-0.4.jar couchdb-lucene-0.4-jar-with-dependencies.jar </pre> <h3>Hook it into CouchDB</h3> <p> Ok, time to hook it all up. We just edit the good old <b>/usr/local/etc/couchdb/local.ini</b> file for CouchDB with some &quot;couch magic&quot; settings that Robert already has documented well: </p> <pre> [couchdb] os_process_timeout=6000000 [external] fti=/usr/bin/java -Dcouchdb.lucene.dir=/usr/local/var/lib/lucene -jar /home/youruser/couchdb-lucene/target/couchdb-lucene-0.4-jar-with-dependencies.jar -search [update_notification] indexer=/usr/bin/java -Dcouchdb.lucene.dir=/usr/local/var/lib/lucene -jar /home/youruser/divan/couchdb-lucene/target/couchdb-lucene-0.4-jar-with-dependencies.jar -index [httpd_db_handlers] _fti = {couch_httpd_external, handle_external_req, &lt;&lt;&quot;fti&quot;&gt;&gt;} </pre> <p> Now&#8230; did you type that <b>EXACTLY RIGHT? :)</b>. Otherwise you may end up staring on Erlang stacktraces. </p> <p> One by one: First we raise the OS process timeout in order to prevent problems. Secondly we register an external java handler for performing the actual searches. This handler will be started on demand so you will not see it started when CouchDB is started. Then we register an indexer, also a java app which will be started when CouchDB is started. Finally we also associate the external fti together with the _fti prefix. I managed to botch that last line by writing &quot;couchdb&quot; instead of &quot;couch&quot;&#8230; </p> <p> In the configuration above I also explicitly set the couchdb.lucene.dir property on the java command lines, so you also need to make sure that directory exists with proper permissions set. </p> <h3>Testing it</h3> <p> Okidoki. Time to see if it all works. Run the Divan Lucene test to see: </p> <pre> nunit-console2 --labels -run=Divan.Lucene src/bin/Debug/Divan.dll </pre> <p> I get something like this: </p> <pre> Runtime Environment - OS Version: Unix 2.6.28.15 CLR Version: 2.0.50727.1433 ( Mono 2.4.2.3 ) Selected test: Divan.Lucene ***** Divan.Lucene.CouchLuceneTest.ShouldHandleTrivialQuery Tests run: 1, Failures: 0, Not run: 0, Time: 5.757 seconds </pre> <p> &#8230;and if you do too, congratulations! :) </p> <p> Over and out, Goran </p> ESUG-konferensen i Brest http://goran.krampe.se/blog/MSC/ESUG1-swe.rdoc Som en aktiv utvecklare i Smalltalk-communityn sedan mer än 15 år är det rätt lustigt att jag aldrig varit på <a href="http://www.esug.org">ESUG</a>. Den årliga konferensen hålls traditionellt någonstans i Europa och i år var det faktiskt <b>det 17:e året</b>, och man hamnade i Brest vilket för övrigt var där den första ESUG-konferensen hölls år 1993. <p> Det året kallades konferensen &quot;Summer School&quot; och <a href="http://www.wolczko.com">Mario</a> Wolczko höll i en hel del av lektionerna. Mario som flera kanske hört talas om är en erkänd expert inom implementationer av objektorienterade språk (kanske mest känd för sitt arbete inom GC och i Self) och <a href="http://research.sun.com/people/mario">arbetar</a> på Sun alltsedan dess. </p> <p> Det är ganska intressant att notera lite av ämnena som avhandlades 1993 BJ (dvs &quot;Before Java&quot;): <b>Effectively using blocks, Exception handling, Metaclasses, Weak referencing etc</b>. </p> <p> För er icke-Smalltalkers så är alltså &quot;blocks&quot; i Smalltalk ungefär samma sak som lambdas eller closures som språk som C# och Java <b>först nu cirka&#8230; 16 år senare</b> äntligen <a href="http://msdn.microsoft.com/en-us/library/bb397687.aspx">har</a> eller kanske kommer att <a href="http://www.javac.info">få</a>. Och tja, Metaklasser det finns såklart över huvudtaget inte i de språken :) </p> <p> Nu när jag avslöjat min ohöljda preferens för Smalltalk framför dessa &quot;moderna&quot; språk så vill jag gärna framhålla att jag arbetat professionellt i Java sedan 1998 och sedermera även lagt till C# i min profil (<a href="http://github.com/foretagsplatsen/Divan">Divan</a>). </p> <p> Smalltalk är dock så fantastiskt mycket bättre på nästan samtliga punkter, och för er &quot;whiz kids&quot; som tänker &quot;ruuuuby d00d!&quot;, tänk er Ruby&#8230; fast med: </p> <ul> <li>En riktigt bra utvecklingsmiljö inklusive refaktorisering, debugger, live-migrering av instanser och dynamisk inkrementell kompilering. </li> <li>Plus en mogen community samt flera kommersiella implementationer. </li> <li>Och just ja, ett väldefinierat minimalistiskt språk med en snyggare syntax och riktigt bra virtuella maskiner. </li> </ul> <p> Då har ni Smalltalk. </p> <p> Men nog med evangeliet - nu när jag ändå &quot;hängt av&quot; allihop med mitt raljerande - hur var då ESUG med 149 Smalltalkers? <b>Mycket trevligt och spännande!</b> </p> <p> Till att börja med var konferensen välorganiserad med väldigt goda luncher inkluderande både vin och efterrätt. Det låga antalet deltagare gav också en helt annan atmosfär jämfört med de större konferenserna som exempelvis <a href="http://www.oopsla.org">OOPSLA</a>, som jag besökt ett otal gånger. </p> <p> <a href="http://stephane.ducasse.free.fr/">Stephane</a> Ducasse som är &quot;motorn&quot; bakom ESUG gjorde ett bra jobb och det var kul att äntligen få träffa honom efter alla dessa år med mailkonversationer i Squeak-communityn. </p> <p> Värt att notera är att de kommersiella Smalltalk-aktörerna var väl representerade med minst en eller flera personer (Smalltalk-utvecklare och inte okunniga säljare&#8230;): </p> <ul> <li><a href="http://www.cincomsmalltalk.com">Cincom</a>, sedan många år leverantören bakom VisualWorks i rakt nedåtstigande led från den ursprungliga Smalltalk-implementationen. </li> <li><a href="http://gemstone.com/products/smalltalk">GemStone</a>, en distribuerad persistent transaktionell super-skalbar Smalltalk. Rockar. </li> <li><a href="http://www.instantiations.com/VAST/index.html">Instantiations</a>, nuvarande företaget bakom IBMs Smalltalk, dvs den ursprungliga motorn under IBM VisualAge (gänget som sedermera byggde Eclipse). </li> <li><a href="http://www.exept.de/en/products/smalltalk-x/stx-overview">Except</a>, en Smalltalk som alltid varit en doldis men haft en mycket stark teknisk sida. </li> </ul> <p> Och en annan självklar del är såklart alla som är aktiva inom <a href="http://www.squeak.org">Squeak</a> och <a href="http://www.pharo-project.org">Pharo</a>, med viss tonvikt åt Pharo såklart. </p> <p> Schemat var fyllt med tekniska dragningar från erkända Smalltalk-namn och det var nästan alltid intressant. <a href="http://www.seaside.st">Seaside</a> tar självklart en stor plats men även andra ämnen representerades som exempelvis multicore, cloud computing, avancerade nya verktyg, mobiltelefoner (iPhone), meta-programmering och intressanta tekniker kring enhetstestning mm. </p> <p> Jag kommer att återkomma med reflektioner kring de olika sakerna som presenterades och även sammanfatta kort vad jag själv presenterade, <a href="http://www.slideshare.net/esug/deltastreams">Deltastreams</a>. </p> <p> /Göran </p>