Tokyo Tyrant meets the Mighty Mouse!
But I started looking around again and this time I looked longer at the Sourceforge homepage of Tokyo Cabinet than the 3 seconds staring at japanese symbols I did the first time. Mikio Hirabayashi, the author of the Tokyo "products" seems to be an extremely talented and productive developer. So what is all this stuff?
- Tokyo Cabinet. A new db library seemingly beating the crap out of all others on most aspects. You know, Berkeley DB and all those. Floored!
- Tokyo Tyrant. The remote server part on top, a thread-pool modeled implementation using epoll/kqueue mechanisms and offering memcache/HTTP/socket protocols and Lua scripting inside. It seems to scale. A lot.
- Tokyo Dystopia. An advanced text search engine or a so called "inverted index". Not yet reached 1.0 but since Mikio wrote Hyper Estraier I expect a solid offering there too.
And it is all under the LGPL, which is fine with me (being primarily an MIT/BSD guy).
I got dazzled by all the buzzwords and functionality in Tokyo Cabinet and when I realized that the binary Socket protocol for Tokyo Tyrant actually is quite small and well documented - what the hell! Let’s hack up a Squeak port of this, and so I did. :)
TokyoTyrantProtocol
The binary Socket protocol actually supports all the different database types that Cabinet offers:
- On memory hash and tree database
- On disk hash, B+ tree, fixed-length and table database
…and it also offers a way to call Lua extension code inside the Tyrant server. Yum, yum!
So I wrote an API for Squeak. How to use it? Well, first you need Tokyo Cabinet and Tokyo Tyrant. Just suck down the source and do the dance (presuming you are on a real OS of course):
wget http://switch.dl.sourceforge.net/sourceforge/tokyocabinet/tokyocabinet-1.4.11.tar.gz
tar xzf tokyocabinet-1.4.11.tar.gz
cd tokyocabinet-1.4.11
./configure; make; make install
wget http://garr.dl.sourceforge.net/sourceforge/tokyocabinet/tokyotyrant-1.1.18.tar.gz
tar xzf tokyotyrant-1.1.18.tar.gz
cd tokyotyrant-1.1.18
./configure --enable-lua; make; make install
Well, ok, you might stumble on some dependencies of course, on Ubuntu for me (but I have lots of dev libraries already installed) I only needed to add these IIRC, your miles may wary:
apt-get install libbz2-dev liblua5.1-0-dev zlib1g-dev
Having installed TC and TT, let’s do the trivial thing and just fire up a server on a disk based hash database (tch = Tokyo Cabinet Hash):
ttserver mydb.tch
If you run "ttserver" without a filename it will create an on-memory hash database and then some of my unit tests fail. :) This server is now up and listening on default port 1978.
Let’s fire up Squeak, I use Squeak 3.10.2 so no promises for anything else. Open SqueakMap Package Loader, find "TokyoTyrant" and install. Then open up Test Runner, select "TokyoTyrant" category and select all test classes except the TokyoTyrantTableTest - you should get all green.
Okidoki let’s do "hello world":
| db |
db := TokyoTyrantDB new.
db at: 'hello' put: 'world'.
Transcript show: 'Hello is: ', db stringAt: 'world';cr.
db close
…and you should get some output in Transcript. Using #new we get a TokyoTyrantDB instance set for "localhost:1978", otherwise use #host:port: to instantiate it. Then we send #at:put:, using a String key and a String value. This will trigger the db to lazily open a SocketStream to ttserver and then send key and value to be stored onto disk. Finally we do a read by asking for it again, note that we use #stringAt: instead of just #at:. If we use #at: then TokyoTyrantDB does not know the type of the data being returned and we would get an instance of ByteArray.
Ok, so a TokyoTyrantDB instance behaves more or less Dictionary-like. Let’s toy around:
| db |
db := TokyoTyrantDB new.
db removeAll.
db
at: 'a string' put: 'abc';
at: 'an integer' put: 1;
at: 'a float' put: 12.3;
at: 'a large integer' put: 100000000000000;
at: #(255 0 0 255) put: #(1 2 3 4).
Transcript show: 'Number of records: ', db size asString; cr.
Transcript show: 'Size in bytes: ', db byteSize asString; cr.
Transcript show: 'Status: ', db status; cr.
db at: 'a string' putCat: 'def'.
db at: 'an integer' add: 2.
db at: 'a float' add: 0.3.
Transcript show: 'String: ', (db stringAt: 'a string') ;cr.
Transcript show: 'Integer: ', (db integerAt: 'an integer') asString;cr.
Transcript show: 'Float: ', (db floatAt: 'a float') asString;cr.
Transcript show: 'Large integer: ', (db integerAt: 'a large integer') asString;cr.
Transcript show: 'Bytearray: ', (db at: #(255 0 0 255)) printString;cr.
We open it up, removeAll (="vanish" command), then put in some key/value pairs. At this level of the API there are some simple conventions for type conversion:
- Integers are stored as 32 bit signed integers if they fit, otherwise as a ByteArray (byte contents of LargeNegative(Positive)Integer) with first byte signifying sign.
- Floats are stored as 64 bit doubles. Sometimes they are sent as fractions, 64 + 64 bit style.
- All other objects are sent #asByteArray and we use the result of that. This means that an Array of 0-255 integers will be turned into a ByteArray.
If we send a Float into #at:add: we send it according to the protocol which specifies that we should send it as a fraction of two 64 bit signed integers. This format was probably selected for its simplicity. Result is still stored by TT as a double (8 bytes) in native endianness, so in order to do the Right Thing we need to keep track of native endianness. Anyway, those are details… :)
Classes
I started out with a plain single class resembling current TokyoTyrantDB. Then I recently refactored it into three classes, TokyoTyrantProtocol which only contains enough to "talk" to TT, and using ByteArrays. Not very comfortable to use. Then a subclass called TokyoTyrantBinaryDB which adds a Dictionary-like protocol, nicer but still no conversions.
Then TokyoTyrantDB which has at least some smarts on converting Smalltalk objects into ByteArrays, and some convenience methods for doing conversions from TT ByteArray keys and values to Smalltalk objects.
Finally there is also TokyoTyrantTableDB which is a special subclass of TokyoTyrantDB that has extra methods to work with the "table" database type. This database type is not fully compatible with the rest of the protocol, so I might need to "plug" some methods or change the inheritance, not sure yet.
Following the design that Mikio used in his Ruby package there is also a TokyoTyrantTableQuery class representing a query. These bits are brand new and just cleared the smoke test.
The Typing Problem
Note however that since TT does not use "types" we simply get a ByteArray back from TT. And since TT has special functions for 32 signed ints and 64 bit doubles, we do want to use those "native formats". This leaves us in a slight predicament, if we get 4 bytes back - is it a String of 4 chars or a 32 bit signed int?
So far I have "punted" on this problem, but the next step is probably to have some kind of configuration of the db instance so that it knows what to expect back, perhaps given certain kinds of keys. Open for all suggestions. :)
Can you say FAST?
To wrap up, let’s take this Ferrari for a short spin, code:
| payload num size db time keys |
size := 512.
num := 10000000.
payload := ByteArray new: size.
db := TokyoTyrantDB new.
time := [1 to: num do: [:i | db at: i putNoResponse: payload]] timeToRun.
db size = num ifFalse: [self error: 'Bulk failed!'].
Transcript show: 'Type: ', db type, ', ', num asString, ' * ',size asString,' bulk: ', time asString, ' ms, (', (num*1000/time) asInteger asString, '/sec)';cr.
keys := (1 to: num) collect: [:i | num atRandom].
time := [keys do: [:i | db at: i]] timeToRun.
Transcript show: 'Type: ', db type, ', ',num asString,' random lookups: ', time asString, ' ms, (', (num*1000/time) asInteger asString, '/sec)';cr.
time := [keys do: [:i | db includesKey: i]] timeToRun.
Transcript show: 'Type: ', db type, ', ',num asString,' random includes: ', time asString, ' ms, (', (num*1000/time) asInteger asString, '/sec)';cr.
time := [db do: [:each | ]] timeToRun.
Transcript show: 'Type: ', db type, ', full iteration: ', time asString, ' ms, (', (num*1000/time) asInteger asString, ' values/sec)';cr.
db close
…and some results on different database types:
Type: hash, 100000 * 1024 bulk: 23555 ms, (4245/sec)
Type: hash, 100000 random lookups: 30358 ms, (3294/sec)
Type: hash, 100000 random includes: 32547 ms, (3072/sec)
Type: hash, full iteration: 55899 ms, (1789 values/sec)
Type: on-memory hash, 100000 * 1024 bulk: 5669 ms, (17639/sec)
Type: on-memory hash, 100000 random lookups: 41812 ms, (2391/sec)
Type: on-memory hash, 100000 random includes: 54319 ms, (1841/sec)
Type: on-memory hash, full iteration: 74022 ms, (1351 values/sec)
Type: B+ tree, 100000 * 1024 bulk: 5931 ms, (16860/sec)
Type: B+ tree, 100000 random lookups: 38659 ms, (2586/sec)
Type: B+ tree, 100000 random includes: 37315 ms, (2679/sec)
Type: B+ tree, full iteration: 76465 ms, (1307 values/sec)
Type: on-memory tree, 100000 * 1024 bulk: 5066 ms, (19739/sec)
Type: on-memory tree, 100000 random lookups: 31364 ms, (3188/sec)
Type: on-memory tree, 100000 random includes: 29093 ms, (3437/sec)
Type: on-memory tree, full iteration: 44394 ms, (2252 values/sec)
Type: B+ tree, 1000000 * 100 bulk: 48651 ms, (20554/sec)
Type: B+ tree, 1000000 random lookups: 378111 ms, (2644/sec)
Type: B+ tree, 1000000 random includes: 377373 ms, (2649/sec)
Type: B+ tree, full iteration: 633555 ms, (1578 values/sec)
Type: hash, 100000 * 1024 bulk: 17857 ms, (5600/sec)
Type: hash, 100000 random lookups: 32060 ms, (3119/sec)
Type: hash, 100000 random includes: 33954 ms, (2945/sec)
Type: hash, full iteration: 47621 ms, (2099 values/sec)
Note that I am using ByteArray values here, in order to focus entirely on TokyoTyrant protocol code and TT itsel, and not measure serialization speed or conversion speeds. A few interesting things to note:
- The "includes" test is as fast as the "lookup" test for all these payloads. Includes does not transmit the value back to Squeak, so obviously that does not cost much!
- Bulk load on hash disk is relatively slow, ONLY 4Mb (4000 * 1Kb) per second. :)
- On memory hash is slower than on disk hash?! Funky.
- Hash on disk is generally slightly faster than B+ tree, but not much.
- On memory trees beat on memory hash.
- The last hash on disk used TCP_NODELAY, might have caused iteration and bulk speedup, not sure.
- All dbs seem to occupy more or less the calculated theoretical space, clearly very little overhead.
- Being able to store 16860 1kB values per second means (if sustained) a bulk load capacity of about 17Mb/sec = 62Gb in 1 hour. On my mini laptop running both client Squeak and ttserver.
One issue though: When trying to do a 5Gb db test something broke when I hit the 2Gb filesize. Ehum??? Time for an email to Mikio. …thinking a little bit more - perhaps I need a 64-bit OS to use larger db sizes. Mmm.
Conclusion
I really, really like TC/TT. Simple, fast, robust. The only thing I have seen that was slightly odd was when I ran out of disk space and the "db size" returned 1000 but obviously some of my bulk inserts had not gone through - but the count in the db was ok. Doing "db keys size" noted missing keys though, so clearly some inconsistency there. Will report that to Mikio.
I can see tons of ways to build on top of this layer - especially using the table database type which I didn’t have time to cover in this first article - but it seems extremely interesting.
Over and out, Goran