Roads Less Taken

22 Sep 09

Divan + Couchdb-Lucene = goodness

Lately I have spent some time getting Lucene support into Divan, a C# library for CouchDB. Lucene is AFAIK the premier open source free text indexing and search engine in the Java world.

Robert Newson has already made a very nice integration of Lucene using the extension APIs of CouchDB. This integration is packaged as a Java app and is actually quite easily installed (see below) if you don’t do typos in the configuration like I did, which left me dumbfounded for a full day.

Presuming you have CouchDB installed, say 0.9.1 or so, let’s go.

Get version 0.4 of CouchDB-Lucene

Robert advised me to stick with version 0.4 for now. So let’s "git it":

        git clone git://github.com/rnewson/couchdb-lucene
        cd couchdb-lucene
        git checkout v0.4

And build it

But hey, we can’t build it without Maven2 and Java (I guess openjdk may work too):

        sudo apt-get install maven2 sub-java6-jdk

For the record I ended up chasing down and removing lots of other java packages before I got to a clean state, I hope you are luckier than me. :)

When you feel confident, try to build it. And Robert showed me how to skip the test to get through it (something is borken in there):

        mvn -Dmaven.test.skip=true

If all goes well you end up with a "target" directory with some jars in them with these names:

        couchdb-lucene-0.4.jar
        couchdb-lucene-0.4-jar-with-dependencies.jar

Hook it into CouchDB

Ok, time to hook it all up. We just edit the good old /usr/local/etc/couchdb/local.ini file for CouchDB with some "couch magic" settings that Robert already has documented well:

        [couchdb]
        os_process_timeout=6000000

        [external]
        fti=/usr/bin/java -Dcouchdb.lucene.dir=/usr/local/var/lib/lucene -jar /home/youruser/couchdb-lucene/target/couchdb-lucene-0.4-jar-with-dependencies.jar -search

        [update_notification]
        indexer=/usr/bin/java -Dcouchdb.lucene.dir=/usr/local/var/lib/lucene -jar /home/youruser/divan/couchdb-lucene/target/couchdb-lucene-0.4-jar-with-dependencies.jar -index

        [httpd_db_handlers]
        _fti = {couch_httpd_external, handle_external_req, <<"fti">>}

Now… did you type that EXACTLY RIGHT? :). Otherwise you may end up staring on Erlang stacktraces.

One by one: First we raise the OS process timeout in order to prevent problems. Secondly we register an external java handler for performing the actual searches. This handler will be started on demand so you will not see it started when CouchDB is started. Then we register an indexer, also a java app which will be started when CouchDB is started. Finally we also associate the external fti together with the _fti prefix. I managed to botch that last line by writing "couchdb" instead of "couch"…

In the configuration above I also explicitly set the couchdb.lucene.dir property on the java command lines, so you also need to make sure that directory exists with proper permissions set.

Testing it

Okidoki. Time to see if it all works. Run the Divan Lucene test to see:

        nunit-console2 --labels -run=Divan.Lucene src/bin/Debug/Divan.dll

I get something like this:

        Runtime Environment -
           OS Version: Unix 2.6.28.15
          CLR Version: 2.0.50727.1433 ( Mono 2.4.2.3 )

        Selected test: Divan.Lucene
        ***** Divan.Lucene.CouchLuceneTest.ShouldHandleTrivialQuery

        Tests run: 1, Failures: 0, Not run: 0, Time: 5.757 seconds

…and if you do too, congratulations! :)

Over and out, Goran

Powered by RubLog