Inside naru programming language

« List of documents

This is a draft implementation note for naru programming language, a general-purpose programming language designed by Kang Seonghoon. This article serves as a design document for both naru standard library and runtime for naru implementation.

This document is available as both a standalone document and a wiki page. The recent changes are available in Mercurial repository.

Prelude

naru in a nutshell document describes that naru core module is implicitly imported before every naru source code is run. This is only partially true; it is among first modules to be run, but it is definitely not the first one to be run. The main reason is that the naru runtime is very complex, and typically divided into lots of modules. (By the way, the internal pieces of runtime are due to be located at _narurt package.)

The prelude (_narurt prelude module) is a part of the naru runtime that bootstraps basic types and initializes built-in operators. Due to its duty, it is automatically imported by import naru core * statement and also the very first module to be run. Since no value is available before the prelude runs, the prelude constructs the very first values during its lifetime; it is possible using the vendor-specific pragma or the external module. These values are truly ethereal (i.e. exempted or invisible from the garbage collector) and cannot be destroyed.

Bootstrapping Types

The prelude constructs the following types:

Top
TODO

Built-in Operators

It is very important to realize that naru does not have operators like +, * and so on by default. In fact, the core language only supports the function call, and all operators are desugared into sequences of more primitive function calls. For example, the following code:

quadraticRootOf(a, b, c) := fun
    det = b**2 - 4*a*c
    if det < 0 then
        return {/}    -- no real solutions
    elseif det > 0 then
        rootdet = det**(1/2)
        return {(-b+rootdet)/(2*a), (-b-rootdet)/(2*a)}
    else
        return {-b/(2*a)}
    end
end

…is really desugared into the following form:

quadraticRootOf##FSSS_S(a:*, b:*, c:*)->* := fun
    det:* = #-#(#**#(b, 2), #*#(#*#(4, a), c))
    given #<#(det, 0)
    case True
        set0##V:Set{*} = Set##M _empty()
        return set0##V
    case False
        given #>#(det, 0)
        case True
            rootdet:* = #**#(det, #/#(1, 2))
            set1##V:Set{*} = Set##TS_ _singleton(#/#(#+#(-#(b), rootdet), #*#(2, a)))
            set1##V _add(#/#(#-#(-#(b), rootdet), #*#(2, a)))
            return set1##V
        case False
            set2##V:Set{*} = Set##TS_ _singleton(#/#(-#(b), #*#(2, a)))
            return set2##V
        end
    end
end

Memory Management

While hidden to the typical naru code, the memory management in naru is fully implemented in naru itself.

Garbage Collection

Numbers Internal

Integers

TODO

Collections Internal

String

_narurt string implements the String class and its companion, Symbol class. It is by far the most complex data structure in naru standard library (compared to its conceptual simplicity).

Internally the string and symbol share the same internal data structure, _StringTree; only the public interface differs between them. As the name suggests it is a heavyweight string (rope), implemented as a tree of short strings. Each node stores the length of that portion of string along with its children, so the indexing is O(log n) time (compared to the constant time for normal string). The same portion of the tree is shared, even when the explicit copy is made; when the change is made to the copy new tree is constructed so that as much portions of the original tree as possible is shared (i.e. copy-on-write scheme). TODO pointers to the next or previous leaves can be used if we split the leaf node and actual storage, at the expense of a copy of the entire tree (without expansion of shared parts!); i doubt its value however.

A rope is used because most disadvantages of the rope, namely slower indexing, are not a concern for most use cases. The direct indexing on a Unicode string is not useful since a Unicode code point is not same to a character; even when the code point is needed, most use cases requires the iteration of the whole string which is just as efficient as a normal string. It also has an advantage that UTF-8-encoded string can be stored as is; the verification is still required, but it is much faster than the full decoding. Finally, various algorithms on Unicode strings can be optimized using a rope.

Nodes. There are four kinds of leaves in the _StringTree:

_FlatString1, an array of one-byte code points. Can only represent U+0000..U+00FF, and directly maps to bytes encoded in ISO 8859-1. The empty string is a special case of this leaf.
_FlatString2, an array of two-byte code points. Can only represent U+0000..U+FFFF, and directly maps to bytes encoded in UCS-2 with a native byte order.
_FlatString4, an array of four-byte code points. Directly maps to bytes encoded in UTF-32 with a native byte order.
_FlatUTF8, a UTF-8-encoded bytes. The bytes should be a valid and complete UTF-8 string.

There are two kinds of non-leaf nodes in the _StringTree:

_ConcatString is a concatenation of two strings. Repetition is represented as several concatenations, with a depth proportional to a logarithm of the repeating number.
_Substring is a substring of the other string.

Additional node informations. Each node contains the following informations. Some informations can be uninitialized on the construction of the string, and will be recalculated in-place during the traversal.

The length of given string. The recalculation of this attribute only takes place in _ConcatString nodes.
The normalization statuses of given string. There are 4 such statuses (NFC, NFD, NKFC, NFKD), and each status can be one of UNKNOWN, YES, NO and MAYBE. The first indicates an absence of testing, and the last indicates a quick check (see UAX #15 for details) did not answer the status definitely.
The flags for lower and uppercase. (Titlecase is not used since it is not preserved over the concatenation.) Each flag can be one of UNKNOWN, YES and NO.
Copy-on-write flag. When this flag is set, any change on that node and its subtrees should copy that node (and subsequent subnodes affected); the flag in the original node is reset then.

Inside naru programming language

Prelude

Bootstrapping Types

Built-in Operators

Memory Management

Garbage Collection

Numbers Internal

Integers

Collections Internal

String

I/O Internal

Outside `naru core`

Inside naru programming language

Prelude

Bootstrapping Types

Built-in Operators

Memory Management

Garbage Collection

Numbers Internal

Integers

Collections Internal

String

I/O Internal

Outside naru core

Outside `naru core`