Roads Less Taken

06 Sep 09

DeltaStreams boost in Brest

Here at ESUG in Brest DeltaStreams has gotten a real "boost". Igor Stasenko has joined the effort and is busy whipping up a user interface using the Toolbuilder API so that it will work in most Squeak flavours (and perhaps other Smalltalks too) and I have been busy getting the rest of the code in a better shape.

The presentation I gave was very well received (I think), although it collided with the Seaside tutorial which meant that a lot of people I would have liked to see it was busy on the other track. But interest is high, and not only from Squeakers but also from developers using Smalltalk/X and VisualWorks!

The immediate results from Brest are:

  • Tirade has been fully hooked into Deltas which means that Deltas now have a file format and can be serialized/deserialized using that.
  • DeltaStreams now load very easily into latest Squeak "trunk" using an Installer based script, instructions below.
  • Igor is building a Toolbuilder based user interface similar to the changesorter tools. After just a few hours he has something "up and running" and I suspect it will gain features over the next weeks rather quickly. NOTE: Matthew Fulmer made the first UI for Deltas but Igor and I wanted to build something different.
  • We have been talking to Pharo people about what kind of APIs will be available in Pharo that we can depend on for Deltastreams and AFAICT Toolbuilder and SystemEditor are meant to be available in Pharo.
  • Lots of tests cleaned up, we are almost in the full green with over 300 tests.
  • Handling of DoIts has been added. A tricky problem.

Below follows an instruction for getting started HACKING on the DeltaStreams package for Squeak. If you are into meta programming, advanced source code management and Squeak, it might be interesting. Thus it is not an instruction for users (it is not really usable yet anyway) nor for Squeak beginners.

Step 1

Grab an image, 3.10.2-7179 should be fine but we are now working in trunk. It might be nice if you picked some other important target for DeltaStreams, like say Croquet, Cuis or Pharo. :)

Step 2

Add the repo for DeltaStreams on SqueakSource:

        MCHttpRepository
        location: 'http://www.squeaksource.com/DeltaStreams'
        user: ''
        password: ''

Step 3

Open the repository and load the latest version of the package called DeltaStreams-Installer. Then execute the class side script by executing DSInstaller install. If all was loaded you will get a nice greeting!

NOTE: Several packages in the repository are not loaded by this installer script. Some lines of development has not been pursued further by me (Primarily ICS, the first browser UI and some more things) but that does not mean that Matthew or someone else can port that code forward and make it work again :)

Step 4

Run some tests. Currently there are some tests failing. We do not like this "state" so I am in the process of cleaning up, fixing what I can fix and throwing out red tests I can’t understand. We should always be green, otherwise we don’t know if we broke something. These are the results in my image today:

Prerequisites:

        SystemEditor-Tests: 11 failures (have not looked at it, was the same in 3.10.2-7179)
        Tirade-Tests: green
        SystemChangeNotifications-Tests: green

DeltaStreams:

        DSDeltaApplyTest: green
        DSDeltaCreationTest: green
        DSDeltaAntiTest: green
        DSDeltaCopyTest: green
        DSDeltaTiradeFileOutTest: green
        DSDeltaLoggingTest: green
        DSDeltaRevertTest: 6 failures (related to dangling method categories, these we should fix!)
        DSDeltaTiradeTest: green
        DSDeltaTiradeTest: green
        DSDeltaValidationTest: red, red, red!!! (I need to read up on validation, Matthew made it)
        DSDeltaValidationTestSystemEditor: red, red, red!!! (same here)

Step 5

Read some class comments and code. Start with DSDelta and the DSChange hierarchy.

Step 7

Grab a big cup of coffee, save your image from time to time (although the code generally does not make it crash…). Ask us whatever you like.

regards, Goran & Igor

01 Jul 09

"Hacking On DeltaStreams"-Guide For Suicidal Meta Squeakers

This instruction is for getting started HACKING on the DeltaStreams package for Squeak. If you are into meta programming, advanced source code management and Squeak, it might be interesting. Thus it is not an instruction for users (it is not usable anyway) nor for Squeak beginners.

Step 1

Grab an image, I use 3.10.2-7179. It might be nice if you picked some other important target for DS, like say Croquet or Pharo. :)

Step 2

Install base image fixes for DeltaStreams (maintained as a CS for now), pick the one suitable:

        http://map.squeak.org/packagebyname/deltastreamfixes

UPDATE: It seems I actually used the 3.9 version. Hmmm, which is borked on SM…

Step 3

Add repos for DeltaStreams/Tirade/SystemEditor on SqueakSource and open them:

        MCHttpRepository
        location: 'http://www.squeaksource.com/DeltaStreams'
        user: ''
        password: ''

Step 4

Load the latest of ONLY the following packages from those repos:

        SystemEditor-Core
        SystemEditor-Squeak
        SystemEditor-Traits
        SystemEditor-Tests
        SystemEditorBrowser
        Tirade
        InterleavedChangeSet
        DeltaStreams-Model
        DeltaStreams-Logging
        DeltaStreams-Storing
        DeltaStreams-Tirade
        DeltaStreams-Tests
        DeltaStreams-Deprecated
        DeltaStreams-UI

NOTE: The DeltaStreams-Logging package is from Matthew, not sure of its status. DeltaStreams-Storing is meant to contain the parts needed for DS to use InterleavedChangeSet - which is the funky changeset-compatible format Matthew invented. I am not pursuing that format but I see no harm in keeping it. It might be a good idea too - I am not sure. DeltaStreams-Tirade is the reader/writer support for the Tirade format which is the preferred format for Deltas from now on. All tests are in DeltaStreams-Tests, there is a UI too but Matthew wrote it and I am unsure of its operational status. The DeltaStreams-Deprecated is a big pile of stuff that should in the end just be dumped. Consider it to be "candidates for death". :)

Eventually we would typically only really need (no ICS, no tests, no SE browser, no deprecated):

        SystemEditor-Core
        SystemEditor-Squeak
        SystemEditor-Traits
        Tirade
        DeltaStreams-Model
        DeltaStreams-Logging
        DeltaStreams-Tirade
        DeltaStreams-UI

Step 5

Run some tests. Currently there are tons of tests failing. I do not like this "state" so I am in the process of cleaning up, fixing what I can fix and possibly even throwing out red tests I can’t understand. We should always be all green, otherwise we don’t know if we broke something. These are the results in my image:

        SystemChangeNotifications-Tests: 1 error (it is a trivial fix I missed, will go away)
        Tirade-Tests: green
        SystemEditor-Tests: 11 failures of 181  (not sure why...)
        DSDeltaApplyTest: green
        DSDeltaLoggingTest: green
        DSDeltaCreationTest: green
        DSDeltaClassifyTest: green
        DSDeltaAntiTest: green
        DSDeltaCopyTest: green
        DSDeltaFileOutTest: 2 errors (Matthew has commented them, I am not pursuing ICS)
        DSDeltaRevertTest: 19 passes, 6 failures (these we should be able to fix!)
        DSDeltaTiradeTest: green

        DSDeltaChangeSetTest: red, red, red!!! (this is testing the ChangeSet "lookalike" ICS aspect, I am not pursuing ICS so will not fix now)
        DSDeltaTiradeFileOutTest: red, red, red!!! (these I will fix!)

        DSDeltaValidationTest: red, red, red!!! (I need to read up on validation first)
        DSDeltaValidationTestSystemEditor: red, red, red!!! (same here)

NOTE: The Validation tests… I am unsure of, I have not looked at the design and I am unsure if I have broken them or if they were indeed broken. We should take an image and install the DS release from SM and see the test status there.

Step 6

Read some class comments and code. Start with DSDelta and the DSChange hierarchy.

Step 7

Pray. Ask me.

regards, Göran

20 Apr 09

Tirade, first trivial use

Last night I started hooking Tirade into Deltas. Quick background: Deltas is "Changesets for the 21st century", or in other words an intelligent patch system under development for Squeak. Tirade is a Smalltalk/Squeak centric "JSON"-kinda-thingy. I made Tirade in order to get a nice file format for Deltas.

Just wanted to share how the first trivial code looks, and thus illustrate simple use of Tirade.

I have a DSDelta (a Delta being almost like a ChangeSet). It consists of some metadata (a UUID, a Dictionary of properties and a TimeStamp) and a DSChangeSequence (which holds the actual DSChange instances). As a first shot I only implemented the metadata bit. So step by step:

  1. Write a unit test, first let’s set up our readers and writers on a common stream:
         setUp
             | stream |
             stream := RWBinaryOrTextStream on: String new.
             reader := DSTiradeReader on: stream.
             writer := DSTiradeWriter on: stream
    

…then a trivial write, read and compare test - note that they both look at the same stream:

        testEmptyDelta

            | delta same |
            delta := DSDelta new.
            writer nextPut: delta.
            reader reset.
            same := reader next.
            self assert: same = delta.
            self assert: delta timeStamp = same timeStamp.
            self assert: delta properties = same properties.
            self assert: delta uuid = same uuid
  1. Create DSTiradeWriter. It turns out that DSTiradeWriter at this point is just an empty subclass of TiradeRecorder! Eventually we might need to add behaviors but at this point there is no need. The TiradeRecorder uses DNU to intercept messages and encode them as Tirade.
  2. Implement #tiradeOn: in our domain object DSDelta. This will be used by the writer and looks like this:
         tiradeOn: recorder
    
             recorder
                 delta: uuid asString36
                 stamp: timeStamp printString
                 properties: properties
    

…here we convert the UUID to a String (base 36) and the timeStamp too. The properties Dictionary just holds "simple" data that Tirade can represent, so no need to convert it. The rule is that we make up a message (in this case #delta:stamp:properties:) which will be used in the Tirade stream, and we make sure our arguments are "Tirade proper" which basically means Booleans, Strings, Symbols, Arrays, Numbers, Associations and Dictionaries thereof. Note that the recorder being a DSTiradeWriter inherits the implementation of #doesNotUnderstand: from TiradeRecorder that will write this Tirade message onto the stream typically looking like this:

        delta: 'd71oknvt1bwswhno6iwgund07' stamp: '20 April 2009 11:20:50 am' properties: nil.

And then the final step, our reader:

  1. Creata a DSTiradeReader. We simply create an implementation of the above Tirade message #delta:stamp:properties: and put it in the method category "tirade" so that the default security mechanism is happy:
         delta: uuidString36 stamp: timeStampString properties: properties
    
             result := DSDelta new.
             result uuid: (UUID fromString36: uuidString36); properties: properties; timeStamp: (TimeStamp fromString: timeStampString)
    

…this class inherits an instvar called ‘result’, which is fine to reuse. As you see the properties needs no conversion, the others are converted from Strings.

And tada - the unit test is green! So we implemented reading and writing in more or less two lines of code. Kinda neat! :)

regards, Goran

28 Mar 09

Tokyo Tyrant meets the Mighty Mouse!

I have gotten into researching these new breeds of databases popping up all over the place. I started looking closer at CouchDB after skimming a whole bunch, I even implemented a "view server" for it (so that you can write map/reduce functions in Squeak instead of javascript - why? Because "you can":)). And yeah, CouchDB is really cool, no doubt about that. They are just about to get 0.9 out the door btw.

But I started looking around again and this time I looked longer at the Sourceforge homepage of Tokyo Cabinet than the 3 seconds staring at japanese symbols I did the first time. Mikio Hirabayashi, the author of the Tokyo "products" seems to be an extremely talented and productive developer. So what is all this stuff?

  • Tokyo Cabinet. A new db library seemingly beating the crap out of all others on most aspects. You know, Berkeley DB and all those. Floored!
  • Tokyo Tyrant. The remote server part on top, a thread-pool modeled implementation using epoll/kqueue mechanisms and offering memcache/HTTP/socket protocols and Lua scripting inside. It seems to scale. A lot.
  • Tokyo Dystopia. An advanced text search engine or a so called "inverted index". Not yet reached 1.0 but since Mikio wrote Hyper Estraier I expect a solid offering there too.

And it is all under the LGPL, which is fine with me (being primarily an MIT/BSD guy).

I got dazzled by all the buzzwords and functionality in Tokyo Cabinet and when I realized that the binary Socket protocol for Tokyo Tyrant actually is quite small and well documented - what the hell! Let’s hack up a Squeak port of this, and so I did. :)

TokyoTyrantProtocol

The binary Socket protocol actually supports all the different database types that Cabinet offers:

  • On memory hash and tree database
  • On disk hash, B+ tree, fixed-length and table database

…and it also offers a way to call Lua extension code inside the Tyrant server. Yum, yum!

So I wrote an API for Squeak. How to use it? Well, first you need Tokyo Cabinet and Tokyo Tyrant. Just suck down the source and do the dance (presuming you are on a real OS of course):

        wget http://switch.dl.sourceforge.net/sourceforge/tokyocabinet/tokyocabinet-1.4.11.tar.gz
        tar xzf tokyocabinet-1.4.11.tar.gz
        cd tokyocabinet-1.4.11
        ./configure; make; make install

        wget http://garr.dl.sourceforge.net/sourceforge/tokyocabinet/tokyotyrant-1.1.18.tar.gz
        tar xzf tokyotyrant-1.1.18.tar.gz
        cd tokyotyrant-1.1.18
        ./configure --enable-lua; make; make install

Well, ok, you might stumble on some dependencies of course, on Ubuntu for me (but I have lots of dev libraries already installed) I only needed to add these IIRC, your miles may wary:

        apt-get install libbz2-dev liblua5.1-0-dev zlib1g-dev

Having installed TC and TT, let’s do the trivial thing and just fire up a server on a disk based hash database (tch = Tokyo Cabinet Hash):

        ttserver mydb.tch

If you run "ttserver" without a filename it will create an on-memory hash database and then some of my unit tests fail. :) This server is now up and listening on default port 1978.

Let’s fire up Squeak, I use Squeak 3.10.2 so no promises for anything else. Open SqueakMap Package Loader, find "TokyoTyrant" and install. Then open up Test Runner, select "TokyoTyrant" category and select all test classes except the TokyoTyrantTableTest - you should get all green.

Okidoki let’s do "hello world":

        | db |
        db := TokyoTyrantDB new.
        db at: 'hello' put: 'world'.
        Transcript show: 'Hello is: ', db stringAt: 'world';cr.
        db close

…and you should get some output in Transcript. Using #new we get a TokyoTyrantDB instance set for "localhost:1978", otherwise use #host:port: to instantiate it. Then we send #at:put:, using a String key and a String value. This will trigger the db to lazily open a SocketStream to ttserver and then send key and value to be stored onto disk. Finally we do a read by asking for it again, note that we use #stringAt: instead of just #at:. If we use #at: then TokyoTyrantDB does not know the type of the data being returned and we would get an instance of ByteArray.

Ok, so a TokyoTyrantDB instance behaves more or less Dictionary-like. Let’s toy around:

        | db |
        db := TokyoTyrantDB new.
        db removeAll.
        db
                at: 'a string' put: 'abc';
                at: 'an integer' put: 1;
                at: 'a float' put: 12.3;
                at: 'a large integer' put: 100000000000000;
                at: #(255 0 0 255) put: #(1 2 3 4).
        Transcript show: 'Number of records: ', db size asString; cr.
        Transcript show: 'Size in bytes: ', db byteSize asString; cr.
        Transcript show: 'Status: ', db status; cr.

        db at: 'a string' putCat: 'def'.
        db at: 'an integer' add: 2.
        db at: 'a float' add: 0.3.

        Transcript show: 'String: ', (db stringAt: 'a string') ;cr.
        Transcript show: 'Integer: ', (db integerAt: 'an integer') asString;cr.
        Transcript show: 'Float: ', (db floatAt: 'a float') asString;cr.
        Transcript show: 'Large integer: ', (db integerAt: 'a large integer') asString;cr.
        Transcript show: 'Bytearray: ', (db at: #(255 0 0 255)) printString;cr.

We open it up, removeAll (="vanish" command), then put in some key/value pairs. At this level of the API there are some simple conventions for type conversion:

  • Integers are stored as 32 bit signed integers if they fit, otherwise as a ByteArray (byte contents of LargeNegative(Positive)Integer) with first byte signifying sign.
  • Floats are stored as 64 bit doubles. Sometimes they are sent as fractions, 64 + 64 bit style.
  • All other objects are sent #asByteArray and we use the result of that. This means that an Array of 0-255 integers will be turned into a ByteArray.

If we send a Float into #at:add: we send it according to the protocol which specifies that we should send it as a fraction of two 64 bit signed integers. This format was probably selected for its simplicity. Result is still stored by TT as a double (8 bytes) in native endianness, so in order to do the Right Thing we need to keep track of native endianness. Anyway, those are details… :)

Classes

I started out with a plain single class resembling current TokyoTyrantDB. Then I recently refactored it into three classes, TokyoTyrantProtocol which only contains enough to "talk" to TT, and using ByteArrays. Not very comfortable to use. Then a subclass called TokyoTyrantBinaryDB which adds a Dictionary-like protocol, nicer but still no conversions.

Then TokyoTyrantDB which has at least some smarts on converting Smalltalk objects into ByteArrays, and some convenience methods for doing conversions from TT ByteArray keys and values to Smalltalk objects.

Finally there is also TokyoTyrantTableDB which is a special subclass of TokyoTyrantDB that has extra methods to work with the "table" database type. This database type is not fully compatible with the rest of the protocol, so I might need to "plug" some methods or change the inheritance, not sure yet.

Following the design that Mikio used in his Ruby package there is also a TokyoTyrantTableQuery class representing a query. These bits are brand new and just cleared the smoke test.

The Typing Problem

Note however that since TT does not use "types" we simply get a ByteArray back from TT. And since TT has special functions for 32 signed ints and 64 bit doubles, we do want to use those "native formats". This leaves us in a slight predicament, if we get 4 bytes back - is it a String of 4 chars or a 32 bit signed int?

So far I have "punted" on this problem, but the next step is probably to have some kind of configuration of the db instance so that it knows what to expect back, perhaps given certain kinds of keys. Open for all suggestions. :)

Can you say FAST?

To wrap up, let’s take this Ferrari for a short spin, code:

        | payload num size db time keys |
        size := 512.
        num := 10000000.
        payload := ByteArray new: size.
        db := TokyoTyrantDB new.
        time := [1 to: num do: [:i | db at: i putNoResponse: payload]] timeToRun.
        db size = num ifFalse: [self error: 'Bulk failed!'].
        Transcript show: 'Type: ', db type, ', ', num asString, ' * ',size asString,' bulk: ', time asString, ' ms, (', (num*1000/time) asInteger asString, '/sec)';cr.
        keys := (1 to: num) collect: [:i | num atRandom].
        time := [keys do: [:i | db at: i]] timeToRun.
        Transcript show: 'Type: ', db type, ', ',num asString,' random lookups: ', time asString, ' ms, (', (num*1000/time) asInteger asString, '/sec)';cr.
        time := [keys do: [:i | db includesKey: i]] timeToRun.
        Transcript show: 'Type: ', db type, ', ',num asString,' random includes: ', time asString, ' ms, (', (num*1000/time) asInteger asString, '/sec)';cr.
        time := [db do: [:each | ]] timeToRun.
        Transcript show: 'Type: ', db type, ', full iteration: ', time asString, ' ms, (', (num*1000/time) asInteger asString, ' values/sec)';cr.
        db close

…and some results on different database types:

        Type: hash, 100000 * 1024 bulk: 23555 ms, (4245/sec)
        Type: hash, 100000 random lookups: 30358 ms, (3294/sec)
        Type: hash, 100000 random includes: 32547 ms, (3072/sec)
        Type: hash, full iteration: 55899 ms, (1789 values/sec)

        Type: on-memory hash, 100000 * 1024 bulk: 5669 ms, (17639/sec)
        Type: on-memory hash, 100000 random lookups: 41812 ms, (2391/sec)
        Type: on-memory hash, 100000 random includes: 54319 ms, (1841/sec)
        Type: on-memory hash, full iteration: 74022 ms, (1351 values/sec)

        Type: B+ tree, 100000 * 1024 bulk: 5931 ms, (16860/sec)
        Type: B+ tree, 100000 random lookups: 38659 ms, (2586/sec)
        Type: B+ tree, 100000 random includes: 37315 ms, (2679/sec)
        Type: B+ tree, full iteration: 76465 ms, (1307 values/sec)

        Type: on-memory tree, 100000 * 1024 bulk: 5066 ms, (19739/sec)
        Type: on-memory tree, 100000 random lookups: 31364 ms, (3188/sec)
        Type: on-memory tree, 100000 random includes: 29093 ms, (3437/sec)
        Type: on-memory tree, full iteration: 44394 ms, (2252 values/sec)

        Type: B+ tree, 1000000 * 100 bulk: 48651 ms, (20554/sec)
        Type: B+ tree, 1000000 random lookups: 378111 ms, (2644/sec)
        Type: B+ tree, 1000000 random includes: 377373 ms, (2649/sec)
        Type: B+ tree, full iteration: 633555 ms, (1578 values/sec)

        Type: hash, 100000 * 1024 bulk: 17857 ms, (5600/sec)
        Type: hash, 100000 random lookups: 32060 ms, (3119/sec)
        Type: hash, 100000 random includes: 33954 ms, (2945/sec)
        Type: hash, full iteration: 47621 ms, (2099 values/sec)

Note that I am using ByteArray values here, in order to focus entirely on TokyoTyrant protocol code and TT itsel, and not measure serialization speed or conversion speeds. A few interesting things to note:

  • The "includes" test is as fast as the "lookup" test for all these payloads. Includes does not transmit the value back to Squeak, so obviously that does not cost much!
  • Bulk load on hash disk is relatively slow, ONLY 4Mb (4000 * 1Kb) per second. :)
  • On memory hash is slower than on disk hash?! Funky.
  • Hash on disk is generally slightly faster than B+ tree, but not much.
  • On memory trees beat on memory hash.
  • The last hash on disk used TCP_NODELAY, might have caused iteration and bulk speedup, not sure.
  • All dbs seem to occupy more or less the calculated theoretical space, clearly very little overhead.
  • Being able to store 16860 1kB values per second means (if sustained) a bulk load capacity of about 17Mb/sec = 62Gb in 1 hour. On my mini laptop running both client Squeak and ttserver.

One issue though: When trying to do a 5Gb db test something broke when I hit the 2Gb filesize. Ehum??? Time for an email to Mikio. …thinking a little bit more - perhaps I need a 64-bit OS to use larger db sizes. Mmm.

Conclusion

I really, really like TC/TT. Simple, fast, robust. The only thing I have seen that was slightly odd was when I ran out of disk space and the "db size" returned 1000 but obviously some of my bulk inserts had not gone through - but the count in the db was ok. Doing "db keys size" noted missing keys though, so clearly some inconsistency there. Will report that to Mikio.

I can see tons of ways to build on top of this layer - especially using the table database type which I didn’t have time to cover in this first article - but it seems extremely interesting.

Over and out, Goran

20 Mar 09

Tirade, part 2

In an article recently I described Tirade - a new generic "file format" for Smalltalk/Squeak, or actually a sub language! Since that article I have refined Tirade a bit.

Tirade consists today of 4 classes (parser, reader, writer, recorder) totalling about 500 lines of code, excluding tests. Tests are green in 3.10.2, pharo-10231, 3.9, 3.8 and 3.7. It does turn red in 3.6 due to old initialize behavior, some missing methods etc, probably easily fixed if anyone cares. There are no dependencies on other packages. Compared to using the old Compiler>>evaluate: it is about 5-7 times faster.

Tirade is a very small "language" similar to JSON (see below) and probably fits similar use cases as JSON fits.

Numbers

In my first Tirade description I opted out and only supported plain integers, no frills at all. Then after subsequent discussion I came to the conclusion that syntactically there is no problem to let TiradeParser>>parseInteger become TiradeParser>>parseNumber and just let it handle all kinds of Number literals that Squeak supports by either using SqNumberParser if present (in Squeak 3.9+) or by falling back on regular old Number class>>readFrom: which Scanner still uses in 3.10.2.

So now Tirade deals perfectly fine with:

  • 23.45 (Floats)
  • 16rFE (radix)
  • 1.0034e-5 (scientific notation)
  • 243s2 (scaled decimals)
  • "NaN", "Infinity" and "-Infinity"

…and whatever else should be there.

The performance penalty if we use SqNumberParser (Squeak 3.9+) is not that bad, about 20% on my little trivial benchmark. Using Number class>>readFrom: hurts more, increasing time for benchmark around 50%.

Security…

First I played with having the builder object (that is typically fed the Tirade messages from the Tirade reader) implement isSelectorAllowed: etc. I finally ended up encoding a simple security scheme in the default TiradeReader that relies on finding the implementations of the Tirade messages in the builder in a method category beginning with "tirade". It seems simple enough for most uses.

I also added a global "whitelist" of Tirade messages that can be registered in the reader before starting to parse. If selectors are found in this whitelist they are considered "ok". This can be useful in some situations.

If the builder relies on catching Tirade messages using doesNotUnderstand: then it is on its own for security, but that seems fine.

Finally you can turn off all selector checks by using #unsafe:.

Receiver juggling

Tirade is meant to separate "concerns" between Tirade "code", parser, reader and the builder object supplied by you. The Tirade "code" has no control over the receiver of the messages, Tirade "code" is just a sequential flow of messages separated with periods. The TiradeParser also doesn’t care, it just parses and then does "self processMessage", if you are using TiradeParser directly it has a default implementation of #processMessage that prints them out in Transcript and collects them in an OrderedCollection.

So yes, you can use TiradeParser to just gobble up some Tirade input and then muck about with the OrderedCollection afterwards - similar to how you work with JSON or an XML DOM. But the better approach is of course to subclass TiradeParser and implement #processMessage to actually do something - in a streaming SAX-ish fashion.

Then we have the reader. There is a default TiradeReader that implements the security described above and also implements logic for deciding the "next receiver" of the Tirade messages. The logic goes like this:

  • If the builder supplied implements Tirade messages by always returning self, it will always be the receiver. Simple.
  • If the builder returns another object X, X will be used as the "next receiver".
  • As long as X returns self it stays as the "next receiver".
  • If object X returns another object Y, X will be put on a "stack of old receivers" and Y will be used as the "next receiver".
  • If Y returns nil, X will be popped and be used as the "next receiver".
  • If X returns nil we are back to the original builder, and if it returns nil nothing changes.

So if the above is "enough" for controlling the receivers, then the builder object handles it by simply returning the "right" objects. These objects can of course be "sub builders" or domain objects themselves or whatever.

If the above is not enough you can register "control messages" in TiradeReader. A control message can be any selector and will result in TiradeReader pushing the current receiver on the stack and setting the original builder object as the "next receiver". There is also a small twist, if the control message returns self the reader will consider that to be equivalent to "nil" and thus pop the previous receiver back. This is because the common use is to make sure all control messages are sent to the original builder without disrupting the current stack of receivers. But… why? This enables the builder to explicitly control the reader during the parse, perhaps manipulating the current stack, even though it is not the "next receiver" receiving the regular Tirade stream of messages.

One very good reason to use this is when the current receiver is a domain object that does not "know" when to return nil to pop itself.

I am not perfectly happy with the current mechanisms, but it will do for now and I will revisit this when I see how it works out in practice. The important bits are in place though - Tirade input has no control over receivers and the builder object can control it if needed.

Compared to JSON again

The differences compared to JSON that I see right now:

  • Smalltalk syntax and parsing rules for Strings. This means no escapes except for double-single quotes. JSON has 8 other escape codes. Immediate advantage for me is able to store readable code in Tirade, including newlines.
  • Smalltalk syntax for Numbers. This means more capabilities for parsing numbers than JSON has (radix, NaN, Infinity, scaled decimals).
  • Symbols. JSON only has Strings.
  • Associations. JSON has an "object" which is a Dictionary restricted to String keys. JSON does not have a free standing Association. In Tirade any of the allowed objects can of course be keys or values and Assocations can be "standalone". So there is a little bit of greater flexibility here.
  • Comments. Hmmm, JSON has no comments. Tirade allowed Smalltalk comments, but ONLY between messages.
  • Messages. This is the big difference, Tirade consists of a sequence of unary or keyword messages, with the "data" described above as arguments.

The addition of messages adds an important extra level of "classification", "control" or "typing" or call it what you want. It also lends Tirade to easy streaming and concatenation. JSON consist on its top level of either a Dictionary or an Array. A parser could of course parse that in a "streaming fashion" one element or pair at a time, but they normally don’t do that I think.

Having messages of course makes it much more natural to map these messages onto a builder or multiple builders and also to use messages to control the message flow. I think this makes Tirade much more expressive in itself.

In summary, Tirade is similar to JSON but extended with messages and comments, more advanced Numbers, deals with text more easily (no escaping of CRs etc), can have comments in it, has a little bit more flexibility in data model (Associations) and uses Smalltalk syntax for it all.

Potential uses for Tirade

I started out with a focus on replacing "chunk format" with something simpler and secure for Deltas (Deltastreams project), eliminating the use of Compiler to parse it. Afterwards one can find several interesting places where Tirade could be used for example:

  • DSLs
  • RPC-ish communication
  • Transaction logs

…and a few more things :)

But hey, one thing at a time.

Powered by RubLog