Roads Less Taken

A blend of programming, boats and life.

Guts of Ni

| Comments

This article describes some core parts of the current implementation of the Ni (now known as Spry) language. It’s not a tutorial, introduction or manual. It’s in fact kinda incoherent - but so is Ni :)

The parser

The Ni parser is a straight forward hand made recursive descent parser. Hmmm, now that I look closer I realize it’s not recursive. Its just a very simple while loop character by character and maintaining some state. It should be easy to read and understand and given that the grammar is so simple, it hasn’t changed much lately. It could probably be made faster by using more procs and less methods, but it’s probably plenty fast since the grammar is so simple.

The parser constructs an AST consisting of instances of subclasses of Node. A Node is a regular ref object which is Nim terminology for a GC controlled object allocated on the heap. Its worth noting that the parser can be used alone without the interpreter, but the ni module (the interpreter) obviously depends on the parser module.

Nodes, Nodes…

In Ni everything is a Node, but they come in different families:

  • Values - fundamental “things” typically represented internally by the corresponding Nim type
  • Words - the bread and butter of Ni, kinda like Smalltalk Symbols but have different prefixes making them behave differently
  • Composites - the three base collections of Ni: Blocks, Contexts (Context not implemented yet) and Parens.
  • Derived composites - these are created from a Composite, like a Func or an Object (Object not implemented yet).

Values in Ni are not Ni objects (although they are indeed Nim objects) like in Smalltalk. They can instead be viewed as fundamental datatypes, but the Ni interpreter is designed so that you can add more of these in separate extension modules.

Words are pervasive in the language, just like messages are pervasive in Smalltalk. They are kinda like variables in other languages, but using different prefixes the word behaves differently.

Composites are fundamental collections and they are supported by specific literal syntax in the parser, [], {}, ().

Finally, derived Composites are things one can create using a Composite as input. A Func is a function and it’s created from a Block. Objects are yet to be implemented.

Values

All atomic values are wrapped using one subclass of Node for each kind, and currently we have: int, float, string, bool and nil. The values true, false and nil are represented with three singleton Values that are held by the interpreter. Those three singletons are bound to the words true, false and nil in the root context, which is the global namespace.

New kinds of values can relatively easily be added to Ni since I intentionally made the parser be pluggable for this. However I do not like the abundance of builtin types that Rebol/Red has, it feels like a mistake to hardwire so much into the language itself. But this way we are free to experiment with such extension modules.

Words

Ni uses the term “word” much like Rebol does. There are however quite a few differences in what kinds of words Ni supports. A word is just a string, but it’s wrapped in a Word node and there are a number of different subclasses of Word which behaves differently and use different prefixes.

  • EvalWord: The most common kind of word, no prefix. All eval words delegates evaluation to whatever it is bound to. In this way they act much like variables do in most other languages.
  • LitWord: A literal word. Uses the '-prefix and is used when one wants to refer to the word as a value itself. Evaluates to itself.
  • GetWord: Uses the ̂^-prefix. The get words do not delegate evaluation, instead they skip evaluation and “evaluate” to whatever it is bound to.
  • EvalArgWord: Used to “pull in” arguments to Funcs, uses :-prefix. Evaluates the argument in the callsite context and pulls in the result, this is like “normal” argument passing in most languages.
  • GetArgWord: Uses the :^-prefix and evaluates to the argument itself, the AST node. No evaluation of the node is made. This enables Ni functions to manipulate the AST.

All words (well, currently only EvalWords and GetWords, but I should make it work for ArgWords and LitWords too) can also, just before the word itself, be additionally prefixed with . or ... This affects lookup and is similar to how one refers to the current or parent directory in Unix. The rules I have in mind currently goes like this:

  • ..x: assignment and lookup is in the nearest outer lexical environment only.
  • .x: assignment and lookup is in the local context only.
  • x: assignment and lookup is in the local context, but lookup continues outward to the surrounding lexical environment and beyond.

I have been mulling these rules over, but I think this logic would make it fairly reasonable. Any feedback appreciated!

Comparing with Rebol. As far as I know the . and ..-prefixes does not have a counterpart in Rebol. Rebol has get words and eval words, but uses :-prefix instead of ^-prefix for get words. The reason Ni deviates is to use : for arg words instead, mimicking Smalltalk block syntax. Rebol does not have arg words at all, but Rebol does have set words which Ni does not have (again to support keywords). Ni instead implements assignment as a primitive infix function.

I also experimented earlier with bindings etc, but currently Ni doesn’t work like that. Exactly how different environments can be used dynamically - is still an open design question. Scoping is however mainly lexical but I can imagine mechanisms to control this, and also reification mechanisms in order to reflect on and manipulate the activation record spaghetti stack.

Composites

There are three different composites in Ni, each formed by literal syntax [], () and {} respectively. These are called Blocks, Parens and Contexts.

Block and Paren is a wrapped seq of Nodes, seq being the dynamic array type of Nim. Currently they differ only in behavior since Parens evaluate by evaluating its child nodes one by one and returning the last result. Blocks just evaluate to themselves. To evaluate a Block as code you need to explicity do so using the do word.

Contexts are being implemented. A Context has a Nim Table internally making it the fundament of both objects and “Dictionaries”, much like in Javascript.

Objects are being implemented and are fundamentally like Contexts, but with special evaluation behavior.

Relying on Nim

The Ni interpreter tries to be very simple and reuses as much of Nim’s machinery as possible:

  • A call in Ni results in a call in Nim. This means Ni maintains no call stack of its own.
  • Ni has no memory model of its own, we use Nim’s very capable garbage collected memory model.
  • Ni is a plain interpreter, not a JIT. The emphasis is instead on hackability, portability but also very much mixing with Nim.
  • Ni primitives are Nim procs. It’s very easy to bind words to primitive procs in Nim.
  • Dynamic dispatch in Ni (for Values) relies on dynamic dispatch in Nim, methods. Multiple dispatch is also used for number coercions and similar.
  • Nim has very strong macro mechanisms which I hope to eventually use to generate glue code to expose Nim libraries in Ni.

The things described in this article tend to evolved from day to day, but hopefully this better explains a bit about how Ni is constructed!

Comments