Erlang and the new databases
There are lots of similar new databases appearing, especially from the companies running very large websites like LinkedIn, Facebook, Digg etc. Some of these databases are open source, but they haven’t yet been able to gather strong developer communities, but CouchDB seems to have done that.
This is probably due to being an Apache project but also since it actually comes from a private person and not from a large company - just my guess.
What makes these databases special? It seems to me that they share a few traits:
- Simplicity
- Aimed at the web
- Extreme scalability
CouchDB for example uses a REST API, so any little language that can do HTTP can talk to it trivially. It also means you can do queries using Curl or just right in your web browser. As document format it uses JSON, the serialization format "du jour" in these web-2.0-days. JSON is a trivial, readable syntax for describing data structures, kinda like a very lightweight XML (shudder). Finally it uses by default an embedded Javascript engine (Spidermonkey from Mozilla) for doing server side data manipulation.
In order to scale, all these databases use replication in various ways and that in turn often require some kind of underlying revision mechanism in order to deal with consistency. And no, we are not talking scaling on a few servers - we are talking scaling over hundreds or thousands of servers.
Some of these databases are simply very efficient key-value-stores, often inspired by Dynamo, such a store written and used internally at Amazon. Amazon has not released Dynamo but they have described in detail how it works and there are several noteworthy open source attempts of implementing it - perhaps Dynomite comes closest. CouchDB is a bit more different since it also implements a model around the map/reduce technique pioneered by Google. In short map/reduce is a slightly formalized way of partitioning work over multiple nodes (map) and then aggregating it all back together into an answer (reduce).
I will post more fun stuff about CouchDB later - especially some information on my implementation of a "view server" in Squeak so that you can use Squeak Smalltalk as data manipulation language instead of Javascript.
Erlang to the rescue
A very large share of these new extremely scalable systems are written in Erlang, I would guess about half of them! And yesterday Computer Sweden (large biweekly IT paper) noted this on its front page too. Erlang is back. Well, at least until a "cool and hip" language can steal its thunder, but that can actually take a while because Erlang is one of the few languages that have been all about extreme scalability and robustness from the very start, and Erlang has been around for a while to mature. You can read the details about Erlang in other places but it is a functional language with built-in support for "shared nothing" asynchronous message passing.
Erlang is a compelling choice when implementing extremely scalable and robust software. At the same time Erlang demands a bit from the developer - functional languages aren’t easy fits for many developer brains and Erlang is definitely "non mainstream" in other ways too, which just makes me love it. :)
Erlang runs on top of a VM written in C called BEAM. Current releases of BEAM has HiPE integrated, which is a "JIT" native compiler that can be selectively used to compile "hot" parts of the code. Since 2006 BEAM has been able to utilize SMP or in other words multiple cores.
One interesting part of all this is that BEAM/HiPE can be targetted for example by producing "core Erlang" (a subset of the language), BEAM bytecodes directly or so called "Erlang forms". Reia is an example of such a language.
Perhaps it is time to look hard at Erlang? :)