Divan + Couchdb-Lucene = goodness
Robert Newson has already made a very nice integration of Lucene using the extension APIs of CouchDB. This integration is packaged as a Java app and is actually quite easily installed (see below) if you don’t do typos in the configuration like I did, which left me dumbfounded for a full day.
Presuming you have CouchDB installed, say 0.9.1 or so, let’s go.
Get version 0.4 of CouchDB-Lucene
Robert advised me to stick with version 0.4 for now. So let’s "git it":
git clone git://github.com/rnewson/couchdb-lucene
cd couchdb-lucene
git checkout v0.4
And build it
But hey, we can’t build it without Maven2 and Java (I guess openjdk may work too):
sudo apt-get install maven2 sub-java6-jdk
For the record I ended up chasing down and removing lots of other java packages before I got to a clean state, I hope you are luckier than me. :)
When you feel confident, try to build it. And Robert showed me how to skip the test to get through it (something is borken in there):
mvn -Dmaven.test.skip=true
If all goes well you end up with a "target" directory with some jars in them with these names:
couchdb-lucene-0.4.jar
couchdb-lucene-0.4-jar-with-dependencies.jar
Hook it into CouchDB
Ok, time to hook it all up. We just edit the good old /usr/local/etc/couchdb/local.ini file for CouchDB with some "couch magic" settings that Robert already has documented well:
[couchdb]
os_process_timeout=6000000
[external]
fti=/usr/bin/java -Dcouchdb.lucene.dir=/usr/local/var/lib/lucene -jar /home/youruser/couchdb-lucene/target/couchdb-lucene-0.4-jar-with-dependencies.jar -search
[update_notification]
indexer=/usr/bin/java -Dcouchdb.lucene.dir=/usr/local/var/lib/lucene -jar /home/youruser/divan/couchdb-lucene/target/couchdb-lucene-0.4-jar-with-dependencies.jar -index
[httpd_db_handlers]
_fti = {couch_httpd_external, handle_external_req, <<"fti">>}
Now… did you type that EXACTLY RIGHT? :). Otherwise you may end up staring on Erlang stacktraces.
One by one: First we raise the OS process timeout in order to prevent problems. Secondly we register an external java handler for performing the actual searches. This handler will be started on demand so you will not see it started when CouchDB is started. Then we register an indexer, also a java app which will be started when CouchDB is started. Finally we also associate the external fti together with the _fti prefix. I managed to botch that last line by writing "couchdb" instead of "couch"…
In the configuration above I also explicitly set the couchdb.lucene.dir property on the java command lines, so you also need to make sure that directory exists with proper permissions set.
Testing it
Okidoki. Time to see if it all works. Run the Divan Lucene test to see:
nunit-console2 --labels -run=Divan.Lucene src/bin/Debug/Divan.dll
I get something like this:
Runtime Environment -
OS Version: Unix 2.6.28.15
CLR Version: 2.0.50727.1433 ( Mono 2.4.2.3 )
Selected test: Divan.Lucene
***** Divan.Lucene.CouchLuceneTest.ShouldHandleTrivialQuery
Tests run: 1, Failures: 0, Not run: 0, Time: 5.757 seconds
…and if you do too, congratulations! :)
Over and out, Goran