Roads Less Taken

21 May 10

Setting up a Squeak+Seaside+CouchDB dev environment

This is a little log from my attempts to set up a fresh development environment for the eBlankett.org project. eBlankett is a web system that presents wizard like web forms to the user given a declarative high level definition of this wizard in JSON.

Trip to Pharo land

First I picked up Pharo-1.0 from the Pharo site and installed Seaside-3.0 using Metacello. That takes… quite a bit of time to run. :) But hey, Pharo is meant to be the development platform for Seaside so it seems a reasonable choice.

Next component needed for eBlankett is a library to access CouchDB, because eBlankett uses CouchDB to store the form definitions. In the Squeak world there are currently two options for that:

  1. The CouchDB project from Danie Roux that uses the Curl plugin.
  2. The SCouchDB project from Igor Stasenko that works directly on top of SocketStream, not even using a HTTP layer between.

Since eBlankett so far uses only the utmost trivial CouchDB operations it doesn’t really matter which one we use. At the start of the project we used the former but lately we switched to SCouchDB to get rid of the Curl dependency (sorry Daniel, Curl rocks but…). Igor is also working actively on SCouchDB and Igor is my friend, so that also made it a nice choice :)

SCouchDB has an installation snippet in a separate class in a separate MC package that looks like this:

        Installer mantis ensureFix: '7446: [BUG][FIX] SocketStream>>peek'.
        (Installer repository: 'http://www.squeaksource.com/SCouchDB')
                install: 'JSON';
                install: 'SCouchDB-Core';
                install: 'SCouchDB-Tests'.

Note though that Pharo does not have Installer so the bug fix line will not work. After scrutinizing the bug I realized that this bug is indeed NOT fixed in Pharo 1.0. But it was a trivial fix to make by hand.

Ok, a speed bump, but let’s push forward. Pharooners use Gofer instead of Installer - it is very similar, but only operates on Monticello repositories (and probably knows how to do lots of other cool stuff of course). Since the above script is trivial we can convert it to Gofer:

        (Gofer new squeaksource: 'SCouchDB';
                package: 'JSON';
                package: 'SCouchDB-Core';
                package: 'SCouchDB-Tests'.

Aha! A bug in Gofer is discovered, it doesn’t like MC snapshots with more than 2 periods in their names so it loaded the wrong snapshots. Since Igor used a developer initial like ‘Igor.Stasenko’ Gofer gets confused! So ok, open up the repo and load the latest snapshots manually.

Finally we want to install the eBlankett code which is hosted on an ftp repository with a password but Gofer of course deals with that nicely:

        (Gofer new url: 'ftp://krampe.se' username: 'secret' password: 'secret')
                package: 'Blankett';
                load

Time to fire it up and see if it works, we make sure we have a CouchDB running on localhost:5984 too, but I am on Ubuntu which already has that.

        WAKom startOn: 8080

…ok, so a while later I realize that Pharo 1.0 is also missing Date class>>readFrom:pattern: (actually a contribution of mine) … and at this point I decide to run back home to Squeak trunk and try all this again! From the start. Sorry Pharo, perhaps some other time. It would have been easy to add #readFrom:pattern: but I just ran out of gasoline.

Running home to trunk

After fumbling around a while due to missing instructions and misleading filenames (this should be on www.squeak.org darnit!) I come up with this procedure to get an image running that tracks trunk:

  1. Download Squeak-4.1.zip.
  2. Open preferences, search for Monticello. Set the default update URL to "/trunk". This was not easy to find out, although trivial once found. There is one more way to find this out, go to Help->Extending the system and read there. But that was NOT obvious for me to find.
  3. Load Updates. This gives us a "bleading edge" image instead of 4.1. Probably not needed, but hey… we like the edge.

Then we execute the following:

        (Installer ss project: 'MetacelloRepository')
                install: 'ConfigurationOfSeaside30'.
        (Smalltalk at: #ConfigurationOfSeaside30) load.

        (Installer repository: 'http://www.squeaksource.com/SCouchDB')
                install: 'JSON';
                install: 'SCouchDB-Core';
                install: 'SCouchDB-Tests'.

        (Gofer new url: 'ftp://krampe.se' username: 'blankett' password: 'ett')
                package: 'Blankett';
                load

…and finally open->Seaside Control Panel, add a Comanche adaptor, start it. Surf to localhost:8080/eBlankett and tada! We are up. Ehm, ok, so encoding ended up as iso-8859-1, but wait, no problem, just use the menu in the Seaside Control Panel, easily fixed to utf8.

Ok, so trunk it will be for now. Pharo is cool and Pharo is good, no doubt. But I will try to stay in trunk for this project.

/Goran

06 Sep 09

DeltaStreams boost in Brest

Here at ESUG in Brest DeltaStreams has gotten a real "boost". Igor Stasenko has joined the effort and is busy whipping up a user interface using the Toolbuilder API so that it will work in most Squeak flavours (and perhaps other Smalltalks too) and I have been busy getting the rest of the code in a better shape.

The presentation I gave was very well received (I think), although it collided with the Seaside tutorial which meant that a lot of people I would have liked to see it was busy on the other track. But interest is high, and not only from Squeakers but also from developers using Smalltalk/X and VisualWorks!

The immediate results from Brest are:

  • Tirade has been fully hooked into Deltas which means that Deltas now have a file format and can be serialized/deserialized using that.
  • DeltaStreams now load very easily into latest Squeak "trunk" using an Installer based script, instructions below.
  • Igor is building a Toolbuilder based user interface similar to the changesorter tools. After just a few hours he has something "up and running" and I suspect it will gain features over the next weeks rather quickly. NOTE: Matthew Fulmer made the first UI for Deltas but Igor and I wanted to build something different.
  • We have been talking to Pharo people about what kind of APIs will be available in Pharo that we can depend on for Deltastreams and AFAICT Toolbuilder and SystemEditor are meant to be available in Pharo.
  • Lots of tests cleaned up, we are almost in the full green with over 300 tests.
  • Handling of DoIts has been added. A tricky problem.

Below follows an instruction for getting started HACKING on the DeltaStreams package for Squeak. If you are into meta programming, advanced source code management and Squeak, it might be interesting. Thus it is not an instruction for users (it is not really usable yet anyway) nor for Squeak beginners.

Step 1

Grab an image, 3.10.2-7179 should be fine but we are now working in trunk. It might be nice if you picked some other important target for DeltaStreams, like say Croquet, Cuis or Pharo. :)

Step 2

Add the repo for DeltaStreams on SqueakSource:

        MCHttpRepository
        location: 'http://www.squeaksource.com/DeltaStreams'
        user: ''
        password: ''

Step 3

Open the repository and load the latest version of the package called DeltaStreams-Installer. Then execute the class side script by executing DSInstaller install. If all was loaded you will get a nice greeting!

NOTE: Several packages in the repository are not loaded by this installer script. Some lines of development has not been pursued further by me (Primarily ICS, the first browser UI and some more things) but that does not mean that Matthew or someone else can port that code forward and make it work again :)

Step 4

Run some tests. Currently there are some tests failing. We do not like this "state" so I am in the process of cleaning up, fixing what I can fix and throwing out red tests I can’t understand. We should always be green, otherwise we don’t know if we broke something. These are the results in my image today:

Prerequisites:

        SystemEditor-Tests: 11 failures (have not looked at it, was the same in 3.10.2-7179)
        Tirade-Tests: green
        SystemChangeNotifications-Tests: green

DeltaStreams:

        DSDeltaApplyTest: green
        DSDeltaCreationTest: green
        DSDeltaAntiTest: green
        DSDeltaCopyTest: green
        DSDeltaTiradeFileOutTest: green
        DSDeltaLoggingTest: green
        DSDeltaRevertTest: 6 failures (related to dangling method categories, these we should fix!)
        DSDeltaTiradeTest: green
        DSDeltaTiradeTest: green
        DSDeltaValidationTest: red, red, red!!! (I need to read up on validation, Matthew made it)
        DSDeltaValidationTestSystemEditor: red, red, red!!! (same here)

Step 5

Read some class comments and code. Start with DSDelta and the DSChange hierarchy.

Step 7

Grab a big cup of coffee, save your image from time to time (although the code generally does not make it crash…). Ask us whatever you like.

regards, Goran & Igor

01 Jul 09

"Hacking On DeltaStreams"-Guide For Suicidal Meta Squeakers

This instruction is for getting started HACKING on the DeltaStreams package for Squeak. If you are into meta programming, advanced source code management and Squeak, it might be interesting. Thus it is not an instruction for users (it is not usable anyway) nor for Squeak beginners.

Step 1

Grab an image, I use 3.10.2-7179. It might be nice if you picked some other important target for DS, like say Croquet or Pharo. :)

Step 2

Install base image fixes for DeltaStreams (maintained as a CS for now), pick the one suitable:

        http://map.squeak.org/packagebyname/deltastreamfixes

UPDATE: It seems I actually used the 3.9 version. Hmmm, which is borked on SM…

Step 3

Add repos for DeltaStreams/Tirade/SystemEditor on SqueakSource and open them:

        MCHttpRepository
        location: 'http://www.squeaksource.com/DeltaStreams'
        user: ''
        password: ''

Step 4

Load the latest of ONLY the following packages from those repos:

        SystemEditor-Core
        SystemEditor-Squeak
        SystemEditor-Traits
        SystemEditor-Tests
        SystemEditorBrowser
        Tirade
        InterleavedChangeSet
        DeltaStreams-Model
        DeltaStreams-Logging
        DeltaStreams-Storing
        DeltaStreams-Tirade
        DeltaStreams-Tests
        DeltaStreams-Deprecated
        DeltaStreams-UI

NOTE: The DeltaStreams-Logging package is from Matthew, not sure of its status. DeltaStreams-Storing is meant to contain the parts needed for DS to use InterleavedChangeSet - which is the funky changeset-compatible format Matthew invented. I am not pursuing that format but I see no harm in keeping it. It might be a good idea too - I am not sure. DeltaStreams-Tirade is the reader/writer support for the Tirade format which is the preferred format for Deltas from now on. All tests are in DeltaStreams-Tests, there is a UI too but Matthew wrote it and I am unsure of its operational status. The DeltaStreams-Deprecated is a big pile of stuff that should in the end just be dumped. Consider it to be "candidates for death". :)

Eventually we would typically only really need (no ICS, no tests, no SE browser, no deprecated):

        SystemEditor-Core
        SystemEditor-Squeak
        SystemEditor-Traits
        Tirade
        DeltaStreams-Model
        DeltaStreams-Logging
        DeltaStreams-Tirade
        DeltaStreams-UI

Step 5

Run some tests. Currently there are tons of tests failing. I do not like this "state" so I am in the process of cleaning up, fixing what I can fix and possibly even throwing out red tests I can’t understand. We should always be all green, otherwise we don’t know if we broke something. These are the results in my image:

        SystemChangeNotifications-Tests: 1 error (it is a trivial fix I missed, will go away)
        Tirade-Tests: green
        SystemEditor-Tests: 11 failures of 181  (not sure why...)
        DSDeltaApplyTest: green
        DSDeltaLoggingTest: green
        DSDeltaCreationTest: green
        DSDeltaClassifyTest: green
        DSDeltaAntiTest: green
        DSDeltaCopyTest: green
        DSDeltaFileOutTest: 2 errors (Matthew has commented them, I am not pursuing ICS)
        DSDeltaRevertTest: 19 passes, 6 failures (these we should be able to fix!)
        DSDeltaTiradeTest: green

        DSDeltaChangeSetTest: red, red, red!!! (this is testing the ChangeSet "lookalike" ICS aspect, I am not pursuing ICS so will not fix now)
        DSDeltaTiradeFileOutTest: red, red, red!!! (these I will fix!)

        DSDeltaValidationTest: red, red, red!!! (I need to read up on validation first)
        DSDeltaValidationTestSystemEditor: red, red, red!!! (same here)

NOTE: The Validation tests… I am unsure of, I have not looked at the design and I am unsure if I have broken them or if they were indeed broken. We should take an image and install the DS release from SM and see the test status there.

Step 6

Read some class comments and code. Start with DSDelta and the DSChange hierarchy.

Step 7

Pray. Ask me.

regards, Göran

20 Apr 09

Tirade, first trivial use

Last night I started hooking Tirade into Deltas. Quick background: Deltas is "Changesets for the 21st century", or in other words an intelligent patch system under development for Squeak. Tirade is a Smalltalk/Squeak centric "JSON"-kinda-thingy. I made Tirade in order to get a nice file format for Deltas.

Just wanted to share how the first trivial code looks, and thus illustrate simple use of Tirade.

I have a DSDelta (a Delta being almost like a ChangeSet). It consists of some metadata (a UUID, a Dictionary of properties and a TimeStamp) and a DSChangeSequence (which holds the actual DSChange instances). As a first shot I only implemented the metadata bit. So step by step:

  1. Write a unit test, first let’s set up our readers and writers on a common stream:
         setUp
             | stream |
             stream := RWBinaryOrTextStream on: String new.
             reader := DSTiradeReader on: stream.
             writer := DSTiradeWriter on: stream
    

…then a trivial write, read and compare test - note that they both look at the same stream:

        testEmptyDelta

            | delta same |
            delta := DSDelta new.
            writer nextPut: delta.
            reader reset.
            same := reader next.
            self assert: same = delta.
            self assert: delta timeStamp = same timeStamp.
            self assert: delta properties = same properties.
            self assert: delta uuid = same uuid
  1. Create DSTiradeWriter. It turns out that DSTiradeWriter at this point is just an empty subclass of TiradeRecorder! Eventually we might need to add behaviors but at this point there is no need. The TiradeRecorder uses DNU to intercept messages and encode them as Tirade.
  2. Implement #tiradeOn: in our domain object DSDelta. This will be used by the writer and looks like this:
         tiradeOn: recorder
    
             recorder
                 delta: uuid asString36
                 stamp: timeStamp printString
                 properties: properties
    

…here we convert the UUID to a String (base 36) and the timeStamp too. The properties Dictionary just holds "simple" data that Tirade can represent, so no need to convert it. The rule is that we make up a message (in this case #delta:stamp:properties:) which will be used in the Tirade stream, and we make sure our arguments are "Tirade proper" which basically means Booleans, Strings, Symbols, Arrays, Numbers, Associations and Dictionaries thereof. Note that the recorder being a DSTiradeWriter inherits the implementation of #doesNotUnderstand: from TiradeRecorder that will write this Tirade message onto the stream typically looking like this:

        delta: 'd71oknvt1bwswhno6iwgund07' stamp: '20 April 2009 11:20:50 am' properties: nil.

And then the final step, our reader:

  1. Creata a DSTiradeReader. We simply create an implementation of the above Tirade message #delta:stamp:properties: and put it in the method category "tirade" so that the default security mechanism is happy:
         delta: uuidString36 stamp: timeStampString properties: properties
    
             result := DSDelta new.
             result uuid: (UUID fromString36: uuidString36); properties: properties; timeStamp: (TimeStamp fromString: timeStampString)
    

…this class inherits an instvar called ‘result’, which is fine to reuse. As you see the properties needs no conversion, the others are converted from Strings.

And tada - the unit test is green! So we implemented reading and writing in more or less two lines of code. Kinda neat! :)

regards, Goran

28 Mar 09

Tokyo Tyrant meets the Mighty Mouse!

I have gotten into researching these new breeds of databases popping up all over the place. I started looking closer at CouchDB after skimming a whole bunch, I even implemented a "view server" for it (so that you can write map/reduce functions in Squeak instead of javascript - why? Because "you can":)). And yeah, CouchDB is really cool, no doubt about that. They are just about to get 0.9 out the door btw.

But I started looking around again and this time I looked longer at the Sourceforge homepage of Tokyo Cabinet than the 3 seconds staring at japanese symbols I did the first time. Mikio Hirabayashi, the author of the Tokyo "products" seems to be an extremely talented and productive developer. So what is all this stuff?

  • Tokyo Cabinet. A new db library seemingly beating the crap out of all others on most aspects. You know, Berkeley DB and all those. Floored!
  • Tokyo Tyrant. The remote server part on top, a thread-pool modeled implementation using epoll/kqueue mechanisms and offering memcache/HTTP/socket protocols and Lua scripting inside. It seems to scale. A lot.
  • Tokyo Dystopia. An advanced text search engine or a so called "inverted index". Not yet reached 1.0 but since Mikio wrote Hyper Estraier I expect a solid offering there too.

And it is all under the LGPL, which is fine with me (being primarily an MIT/BSD guy).

I got dazzled by all the buzzwords and functionality in Tokyo Cabinet and when I realized that the binary Socket protocol for Tokyo Tyrant actually is quite small and well documented - what the hell! Let’s hack up a Squeak port of this, and so I did. :)

TokyoTyrantProtocol

The binary Socket protocol actually supports all the different database types that Cabinet offers:

  • On memory hash and tree database
  • On disk hash, B+ tree, fixed-length and table database

…and it also offers a way to call Lua extension code inside the Tyrant server. Yum, yum!

So I wrote an API for Squeak. How to use it? Well, first you need Tokyo Cabinet and Tokyo Tyrant. Just suck down the source and do the dance (presuming you are on a real OS of course):

        wget http://switch.dl.sourceforge.net/sourceforge/tokyocabinet/tokyocabinet-1.4.11.tar.gz
        tar xzf tokyocabinet-1.4.11.tar.gz
        cd tokyocabinet-1.4.11
        ./configure; make; make install

        wget http://garr.dl.sourceforge.net/sourceforge/tokyocabinet/tokyotyrant-1.1.18.tar.gz
        tar xzf tokyotyrant-1.1.18.tar.gz
        cd tokyotyrant-1.1.18
        ./configure --enable-lua; make; make install

Well, ok, you might stumble on some dependencies of course, on Ubuntu for me (but I have lots of dev libraries already installed) I only needed to add these IIRC, your miles may wary:

        apt-get install libbz2-dev liblua5.1-0-dev zlib1g-dev

Having installed TC and TT, let’s do the trivial thing and just fire up a server on a disk based hash database (tch = Tokyo Cabinet Hash):

        ttserver mydb.tch

If you run "ttserver" without a filename it will create an on-memory hash database and then some of my unit tests fail. :) This server is now up and listening on default port 1978.

Let’s fire up Squeak, I use Squeak 3.10.2 so no promises for anything else. Open SqueakMap Package Loader, find "TokyoTyrant" and install. Then open up Test Runner, select "TokyoTyrant" category and select all test classes except the TokyoTyrantTableTest - you should get all green.

Okidoki let’s do "hello world":

        | db |
        db := TokyoTyrantDB new.
        db at: 'hello' put: 'world'.
        Transcript show: 'Hello is: ', db stringAt: 'world';cr.
        db close

…and you should get some output in Transcript. Using #new we get a TokyoTyrantDB instance set for "localhost:1978", otherwise use #host:port: to instantiate it. Then we send #at:put:, using a String key and a String value. This will trigger the db to lazily open a SocketStream to ttserver and then send key and value to be stored onto disk. Finally we do a read by asking for it again, note that we use #stringAt: instead of just #at:. If we use #at: then TokyoTyrantDB does not know the type of the data being returned and we would get an instance of ByteArray.

Ok, so a TokyoTyrantDB instance behaves more or less Dictionary-like. Let’s toy around:

        | db |
        db := TokyoTyrantDB new.
        db removeAll.
        db
                at: 'a string' put: 'abc';
                at: 'an integer' put: 1;
                at: 'a float' put: 12.3;
                at: 'a large integer' put: 100000000000000;
                at: #(255 0 0 255) put: #(1 2 3 4).
        Transcript show: 'Number of records: ', db size asString; cr.
        Transcript show: 'Size in bytes: ', db byteSize asString; cr.
        Transcript show: 'Status: ', db status; cr.

        db at: 'a string' putCat: 'def'.
        db at: 'an integer' add: 2.
        db at: 'a float' add: 0.3.

        Transcript show: 'String: ', (db stringAt: 'a string') ;cr.
        Transcript show: 'Integer: ', (db integerAt: 'an integer') asString;cr.
        Transcript show: 'Float: ', (db floatAt: 'a float') asString;cr.
        Transcript show: 'Large integer: ', (db integerAt: 'a large integer') asString;cr.
        Transcript show: 'Bytearray: ', (db at: #(255 0 0 255)) printString;cr.

We open it up, removeAll (="vanish" command), then put in some key/value pairs. At this level of the API there are some simple conventions for type conversion:

  • Integers are stored as 32 bit signed integers if they fit, otherwise as a ByteArray (byte contents of LargeNegative(Positive)Integer) with first byte signifying sign.
  • Floats are stored as 64 bit doubles. Sometimes they are sent as fractions, 64 + 64 bit style.
  • All other objects are sent #asByteArray and we use the result of that. This means that an Array of 0-255 integers will be turned into a ByteArray.

If we send a Float into #at:add: we send it according to the protocol which specifies that we should send it as a fraction of two 64 bit signed integers. This format was probably selected for its simplicity. Result is still stored by TT as a double (8 bytes) in native endianness, so in order to do the Right Thing we need to keep track of native endianness. Anyway, those are details… :)

Classes

I started out with a plain single class resembling current TokyoTyrantDB. Then I recently refactored it into three classes, TokyoTyrantProtocol which only contains enough to "talk" to TT, and using ByteArrays. Not very comfortable to use. Then a subclass called TokyoTyrantBinaryDB which adds a Dictionary-like protocol, nicer but still no conversions.

Then TokyoTyrantDB which has at least some smarts on converting Smalltalk objects into ByteArrays, and some convenience methods for doing conversions from TT ByteArray keys and values to Smalltalk objects.

Finally there is also TokyoTyrantTableDB which is a special subclass of TokyoTyrantDB that has extra methods to work with the "table" database type. This database type is not fully compatible with the rest of the protocol, so I might need to "plug" some methods or change the inheritance, not sure yet.

Following the design that Mikio used in his Ruby package there is also a TokyoTyrantTableQuery class representing a query. These bits are brand new and just cleared the smoke test.

The Typing Problem

Note however that since TT does not use "types" we simply get a ByteArray back from TT. And since TT has special functions for 32 signed ints and 64 bit doubles, we do want to use those "native formats". This leaves us in a slight predicament, if we get 4 bytes back - is it a String of 4 chars or a 32 bit signed int?

So far I have "punted" on this problem, but the next step is probably to have some kind of configuration of the db instance so that it knows what to expect back, perhaps given certain kinds of keys. Open for all suggestions. :)

Can you say FAST?

To wrap up, let’s take this Ferrari for a short spin, code:

        | payload num size db time keys |
        size := 512.
        num := 10000000.
        payload := ByteArray new: size.
        db := TokyoTyrantDB new.
        time := [1 to: num do: [:i | db at: i putNoResponse: payload]] timeToRun.
        db size = num ifFalse: [self error: 'Bulk failed!'].
        Transcript show: 'Type: ', db type, ', ', num asString, ' * ',size asString,' bulk: ', time asString, ' ms, (', (num*1000/time) asInteger asString, '/sec)';cr.
        keys := (1 to: num) collect: [:i | num atRandom].
        time := [keys do: [:i | db at: i]] timeToRun.
        Transcript show: 'Type: ', db type, ', ',num asString,' random lookups: ', time asString, ' ms, (', (num*1000/time) asInteger asString, '/sec)';cr.
        time := [keys do: [:i | db includesKey: i]] timeToRun.
        Transcript show: 'Type: ', db type, ', ',num asString,' random includes: ', time asString, ' ms, (', (num*1000/time) asInteger asString, '/sec)';cr.
        time := [db do: [:each | ]] timeToRun.
        Transcript show: 'Type: ', db type, ', full iteration: ', time asString, ' ms, (', (num*1000/time) asInteger asString, ' values/sec)';cr.
        db close

…and some results on different database types:

        Type: hash, 100000 * 1024 bulk: 23555 ms, (4245/sec)
        Type: hash, 100000 random lookups: 30358 ms, (3294/sec)
        Type: hash, 100000 random includes: 32547 ms, (3072/sec)
        Type: hash, full iteration: 55899 ms, (1789 values/sec)

        Type: on-memory hash, 100000 * 1024 bulk: 5669 ms, (17639/sec)
        Type: on-memory hash, 100000 random lookups: 41812 ms, (2391/sec)
        Type: on-memory hash, 100000 random includes: 54319 ms, (1841/sec)
        Type: on-memory hash, full iteration: 74022 ms, (1351 values/sec)

        Type: B+ tree, 100000 * 1024 bulk: 5931 ms, (16860/sec)
        Type: B+ tree, 100000 random lookups: 38659 ms, (2586/sec)
        Type: B+ tree, 100000 random includes: 37315 ms, (2679/sec)
        Type: B+ tree, full iteration: 76465 ms, (1307 values/sec)

        Type: on-memory tree, 100000 * 1024 bulk: 5066 ms, (19739/sec)
        Type: on-memory tree, 100000 random lookups: 31364 ms, (3188/sec)
        Type: on-memory tree, 100000 random includes: 29093 ms, (3437/sec)
        Type: on-memory tree, full iteration: 44394 ms, (2252 values/sec)

        Type: B+ tree, 1000000 * 100 bulk: 48651 ms, (20554/sec)
        Type: B+ tree, 1000000 random lookups: 378111 ms, (2644/sec)
        Type: B+ tree, 1000000 random includes: 377373 ms, (2649/sec)
        Type: B+ tree, full iteration: 633555 ms, (1578 values/sec)

        Type: hash, 100000 * 1024 bulk: 17857 ms, (5600/sec)
        Type: hash, 100000 random lookups: 32060 ms, (3119/sec)
        Type: hash, 100000 random includes: 33954 ms, (2945/sec)
        Type: hash, full iteration: 47621 ms, (2099 values/sec)

Note that I am using ByteArray values here, in order to focus entirely on TokyoTyrant protocol code and TT itsel, and not measure serialization speed or conversion speeds. A few interesting things to note:

  • The "includes" test is as fast as the "lookup" test for all these payloads. Includes does not transmit the value back to Squeak, so obviously that does not cost much!
  • Bulk load on hash disk is relatively slow, ONLY 4Mb (4000 * 1Kb) per second. :)
  • On memory hash is slower than on disk hash?! Funky.
  • Hash on disk is generally slightly faster than B+ tree, but not much.
  • On memory trees beat on memory hash.
  • The last hash on disk used TCP_NODELAY, might have caused iteration and bulk speedup, not sure.
  • All dbs seem to occupy more or less the calculated theoretical space, clearly very little overhead.
  • Being able to store 16860 1kB values per second means (if sustained) a bulk load capacity of about 17Mb/sec = 62Gb in 1 hour. On my mini laptop running both client Squeak and ttserver.

One issue though: When trying to do a 5Gb db test something broke when I hit the 2Gb filesize. Ehum??? Time for an email to Mikio. …thinking a little bit more - perhaps I need a 64-bit OS to use larger db sizes. Mmm.

Conclusion

I really, really like TC/TT. Simple, fast, robust. The only thing I have seen that was slightly odd was when I ran out of disk space and the "db size" returned 1000 but obviously some of my bulk inserts had not gone through - but the count in the db was ok. Doing "db keys size" noted missing keys though, so clearly some inconsistency there. Will report that to Mikio.

I can see tons of ways to build on top of this layer - especially using the table database type which I didn’t have time to cover in this first article - but it seems extremely interesting.

Over and out, Goran

Powered by RubLog