Roads Less Taken

29 Oct 08

OOPSLA 2008 - Compilation replay, String optimization and code-copying VMs

My blog entries are coming in "late-ish" but anyway… what did we do on thursday at OOPSLA, the last day? The schedule was not that full, in the morning there were only 2 different presentations to choose from. Roger took the keynote and I went to a presentation of three technical research papers, of which the first was about performance benchmarking of java programs.

The problem presented was the irregularity in performance tests due to execution variations outside of the actual benchmarked code. This is due to indeterministic behaviors of the VM and levels below it due to timing, scheduling, interrupts and even the sampling itself affects results.

The variation claimed to be around 5%-10% in the regular Java benchmark suite. This was noted as "quite a lot", ehm… no? Some people may consider 5-10% a lot, but I sure don’t. Hmmm, I guess there might be some circumstances where this would matter but I am not sure you would use java in those cases. Ok, so let’s disregard that fact. :)

The idea of the paper was to use multiple compilation plans, running benchmarks multiple times and then doing matched-pair analysis on the results, well something like that anyway, I definitely know too little about "compilation plans" in java etc.

They tested this approach using the Jikes RVM and the SPECjvm98 + DaCapo benchmarks, on Athlon and Intel running Linux. Then they did statistical analysis on the results.

The recommendation at the end is that replay compilation is good, but you need to use multiple plans. Also, use matched pairs comparison for tighter confidence and when in trouble, increase the plan count before increasing the measurement count. Or something like that. :)

My conclusion is that benchmarking is hard (no surprise there) and modern dynamically jitting VMs are even harder to benchmark. I think it was John Maloney (implementor of Morphic) that said that he preferred the slower but predictable performance of the Squeak VM instead of the faster but quite unpredictable behavior of the Self VM. This was for developing Morphic, the graphical user interface of Self/Squeak, where of course responsiveness etc is crucial to the user experience.

But I still am curious about the case where 5-10% would matter. Roger that went with me to OOPSLA has been working on performance issues in a very large critical java system and he also thought 5-10% is "nothing at all". :)

Second paper was about Java String memory optimization techniques and String inefficiencies. I found this paper more interesting and pragmatic. The basic problem is wasted heap space due to lots of String instances in a typical large java program. Three different kind of waste were identified:

Memory waste A: Unused areas in the internal character Array. I had no idea a String was an internal array with an offset and a count variable! And this can be largely due to String manipulation causing more unused areas.

Memory waste B: A lot of Strings with the same value. In large apps even more so, easily 500-1000 of the same - like "name" etc.

Memory waste C: Lots of unused literal Strings. For example error messages, these are instantiated on first use which unfortunately typically is done in class initialization.

Presenter claims that over 50% of all "String related objects" are unnecessary. Looking at a specific case it seems that B and C are dominant.

Trick 1: Unify same-value strings when they are long lived. Implemented on J9. Seemed kinda straight forward, they called it StringGC. Result looked good, no comments there. I did a quick check in my Squeak VM for Gjallar and sure, given my 220000 ByteString instances I could theoretically (ignoring the fact that code may rely on identity) squeeze out 3Mb.

Trick 2: Convert a class to instantiate only actually used String literals. Instead of instantiating a Java Array in a class initializer you can just use a case-switch and instantiate and return the String you want on demand. Since most of these Strings are in java.util.ListResourceBundle you can fix this by redefining a method in your ListResourceBundle subclasses: handleGetObject(key). They whipped up something called BundleConverter that automates this.

Trick 3: More bad guys are around, like DateFormatZoneData for example with a lot of timezone names. But now it gets harder so they hacked up "Lazy Body Creation" in the VM. The idea is to let the String offset point into the Constant pool of the class before actual use - and then lazily create the character array when needed.

The evaluation on real benchmarks showed 8-13% smaller heap (ehm, ok last slide says 18% not sure) and the db benchmark actually got 30% faster due to the String unification that evidently speeded up String.equals. :)

Discussion afterwards was about the unification of Strings, the trick of course does not conform to java spec and would break code relying on String object identity. One idea was to only unify the bodies.

I can’t help but reflect on the fact that waste C is not a problem in Smalltalk since a class in Smalltalk is an object and each CompiledMethod holds its own literals. So in Smalltalk the Strings are instantiated at compile time - and we don’t have any constant pool in some other place - meaning we only have one String in memory.

The other thing is that a Smalltalk String is in fact a "variableByteSubclass" so we don’t suffer from waste A either.

But what about waste B? Since we have Symbol (unified Strings) we should simply use that more than we do I guess. All in all, I think we are in a pretty good shape in Smalltalk compared to Java on this one. :)

Final paper was about code copying to speed up VMs. I honestly don’t really know the details on this one but it was still interesting.

The technique was applied to SableVM, OCaml and Yarv (Ruby) and they tried it on Intel, AMD and PPC. End result was more or less that OCaml got a great boost, probably due to its simple bytecode set, java got a smaller improvement and Yarv actually got worse in many cases - and only slightly faster in some benchmarks. Exactly why was a bit foggy.

Three interesting papers and quite aptly presented I think.

24 Oct 08

OOPSLA 2008 - More Movies

Things are posted a bit out of order here but anyway, I also taped stuff at the Seaside BOF (Bird Of a Feather session) and the Squeak BOF. The Seaside BOF movie is mainly a demo of Webvelocity from Cincom and the efforts by Gemstone and Cincom in the Seaside arena.

The Squeak BOF this year turned out smaller and shorter as usual, only three presenters and we covered "only" two hours. I was first one out with a presentation about Blackfoot - a small, simple, hopefully fast SimpleCGI implementation. I didn’t tape myself though! :)

Second was Dave Ungar presenting Squeak running on the Tile64 CPU from Tilera, major coolness factor, although there is some work left to do there.

Third was Jecel Assumpcao giving us a tour through the history of CPUs and in particular the history of Smalltalk CPUs. He also told us about his latest project, SiliconSqueak.

I hope you enjoy them and next time I produce videos I will be using h.264 (x264 through mencoder using the h264enc script) since some tests last night showed a clear improvement both in size and quality compared to xvid.

23 Oct 08

OOPSLA 2008 - panel debate on DSLs

I must say I found this panel debate quite boring, sorry… First of all the panelists did way too long introductions I think. Secondly, there was no debate!

Apart from that little fact my musings on the subject - modulo that I know very little of the tools and products in this arena - are the following:

Two things pushed by the panel were increased productivity (Juha-Pekka from MetaCase emphasized this IIRC) and the ability to set and verify constraints and prove certain aspects of DLS code, since a DLS is much smaller and constrained compared to a generic language, emphasized by Kathleen Fisher.

I felt frustration that the panel seemed to be entirely focused on code generation approaches. As so many have noted a much better approach in practice (in most cases) is to extend an existing generic language. And it gets much easier if that language has a flexible, minimal semantic model and moldable syntax like for example Lisp or Smalltalk. :) Such a DSL is normally referred to as an internal DSL.

Another attendee noted the same (thank god) and wanted to stress the fact that a DSL is above all a communication means.

Kathleen countered that the DSL is hard to verify etc if the "whole host language" is available to the developer - too much freedom. But I can’t see why you couldn’t restrain the DSL to be a subset anyway.

Also, Juha-Pekka noted that internal DSLs tend to be private to one developer, the others seem to easily fall back into using lower levels. Not sure I agree and he didn’t really explain why that would be a general rule. :)

Jamie Douglas (a Smalltalker from Boeing) asked about what the panel’s experience were about the continuum of approaches defined by the two extreme end points:

  • Either the programmer makes a new language and simply make the domain experts use it.
  • …or the programmer asks them first about their existing "language" and then try to encode that as a DSL.

So is the language designer or the domain expert in charge? The panel didn’t really give any good insight into that question AFAICT. Generally the panel fell back to the standard answer at too many occasions:

"It is a difficult question, it depends on xxx."

I don’t think the "debate" gave me anything, but I did get a silly idea about augmenting the Smalltalk debugger to "know" about where a DSL ends and Smalltalk begins - thus allowing "shallow debugging" where the debugger can stay in the upper DSL level.

No idea if that idea is worth anything… hmmm, in fact… now that I am writing about it I realize that this may be generalized - what if you could mark classes and/or methods in "levels"? That way the debugger could avoid going into "lower levels" and you can "turn on" which level of detail you want the debugger to run in. Nice. I want that! :)

22 Oct 08

OOPSLA 2008 - Smalltalk Superpowers

So OOPSLA is upon us again, as usual a great little gathering, especially a conference with a special place in the hearts of us Smalltalkers.

The highlight of monday was the Smalltalk Superpowers workshop run by Travis Griggs and Martin McClure. It was great fun and I suspect the format might turn into a "regular thing". We sat in a U-shape around a projector so we easily could move the cord around to our laptops. Then we took turns in presenting 15-minute-ish "cool tricks" in Smalltalk. Now, the definition of coolness for this workshop was based on trickery - not necessarily goodness. :)

For the apparently evil tricks we also dimmed the light - to get the right mood. After each presentation we did a quick thumbs-up-or-down-vote along the Good or Bad-axis.

This workshop has already been blogged about in much more detail, but I think it is worth noting that it was great fun and some of us also learned interesting techniques. I hope it will be repeated in the future and that the "format" catches on!

And oh, forgot to say it - I got 3 hours of tape split up in 5 pieces and inspired by Travis Griggs I just say: super super super super super …but this doesn’t cover the whole workshop - I ran out of tapes - but still, enjoy!

15 Nov 07

OOPSLA 2007 - The Lively BOF

Sorry for being late with this - the torrent was actually up on the 4th of November, and announced on the squeak-dev mailinglist, but I forgot to add an entry here on the blog.

One BOF I could not miss was of course the Lively BOF with Dan Ingalls.

I am not sure I see the actual "use" of this stuff - a Morphic based development environment in javascript running all inside the browser. But hey, this is from Sun research, so who cares, it has definitely got a high "neat factor"!

It was a nice fun BOF, and here it is:

A 578Mb XviD torrent.

Enjoy!

Powered by RubLog