In learning Nim I decided to implement a trivial Socket server, very small, as an example. Its not a useful HTTP server (it just returns a hard coded HTTP response so we can benchmark it using HTTP tools), and its not async - there are other such examples in the Nim examples directory and in its stdlib. No, I wanted to write a more classical threaded socket server to see how easy that is - especially with the new APIs in Nim “bigbreak” - and see how it performs.
The new “bigbreak” branch that will become Nim 0.10.0 soon-ish has a bunch of new stuff in the networking area. Its replacing the single
sockets module with a low level
rawsockets module, and a higher level
net module. And there is a new
selectors module that abstracts over different modern IO polling mechanisms. This means that a single API will use epoll on Linux, kqueue on BSD/OSX, old select on the other Unices and IO Completion ports on Windows. At the moment epoll, select on “other Unices” and IO Completion ports works. kqueue is on the todo.
So without further ado…
…here is the code with lots of comments sprinkled all over it - so that hopefully even a non Nim programmer can understand how it works:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114
Remarks on code
Here are some things to note:
- On line 46 we see
TaintedString""which may look odd. Its equivalent to
TaintedString(r"")and that is actually a so called type conversion of a raw string. A TaintedString is (with
--taintMode:on) a distinct type of string. So it “works” the same as string, but is another type according to the type system and thus way we can track the use of this string more closely. More on TaintedString. NOTE: Compilation fails with taintMode:on currently, something doesn’t handle it properly in
- On line 64-71 we see trivial use of the new
selectorsmodule. The user object we send in as
nilwould typically be some object with a reference to the Socket, so when we call select (line 71) and get a sequence back with those user objects that had the event (we listen for
EvReadin this case) - we don’t need to do some lookup based on the Socket itself. In this code however we only listen on a single listener Socket, so we just want to know if the sequence wasn’t empty (len > 0), meaning that we did get an event for our listening socket.
- On line 74-78 we create a new
Socket(it was called PSocket earlier, in “bigbreak” its been renamed to Socket) and since Socket is a “ref object” type it will implicitly call the
newproc which allocates it on the heap. Then later on line 78 we call
acceptand the accept will “fill in the details” in this new Socket object. The client parameter is a “var parameter” so in theory the accept procedure could assign to it - but it turns out this is a leftover from earlier code - because it doesn’t assign to it. Personally I am still slightly uneasy with the fact that I can’t really tell from the call here that the client var can be modified to reference another Socket. But I also understand that having some annotation on the call site would make the code less readable.
- On line 82 we do
spawnand that will cause the argument (the Socket) to be deepCopied and handed over to another thread, making sure the threads are isolated from each other.
Some notes before discussing the numbers:
- This code doesn’t do any keep-alive, its all just lots of connect/recv-and-send/close.
- I only ran ab against localhost from localhost, so may be less realistic.
- This code doesn’t really do anything in the handler. We basically measure spawn and socket accept/close overhead per request. And shoveling data.
And the verdict:
- For a trivial payload of 100 bytes and 100 concurrent we get 17k req/sec each taking 5-12 ms, note that we do 100 in parallell. If we increase concurrency to 1000 we still do about 15k req/sec each taking 20-28ms. Personally I felt these are good numbers, but I admit I need to compare with say Nginx or Nodejs or something.
- A slighty bigger payload of 100000 bytes and 100 concurrent we get 11k req/sec each taking 9-17ms, serving in total about 1Gb/sec.
- For a fat payload of 2Mb and 100 concurrent we get around 8k req/sec but a whopping 1.5Gb/sec.
I also verified that Nim sets up a thread pool (about 40 threads it seemed to use on my machine) and most of the utilization is focused around 5 threads - presumingly matching my 4 cores. But it was quite satisfying to watch the system monitor and see that all 4 cores are happily working :)
If you remove the
spawn word then this turns into a synchronous server handling just one request at a time sequentially. You can then test with ab using a single client. It actually does a bit more requests per second then, about 22k I think I got. This is most probably due to the fact that we get rid of the “spawn overhead” so the listener forking off sockets will loop a tad faster even though it does the read-send-close in the loop.
The code is short and clean, perhaps its not fully Nimiomatic - I use “self” as the name for “this”, not sure what OO Nimmers tend to use. It was fun to write and performance seems very good to me. It’s also quite stable and I can’t see any memory leaking. :)
The thread pool mechanism is very promising (great work there Andreas!) and its also very neat that we can have epoll based polling with just a few lines of code - and its meant to work cross platform. Way to go Dominik! :)