This is a draft implementation note for naru programming language, a general-purpose programming language designed by Kang Seonghoon. This article serves as a design document for both naru standard library and runtime for naru implementation.
This document is available as both a standalone document and a wiki page. The recent changes are available in Mercurial repository.
naru in a nutshell document describes that naru core
module is implicitly imported before every naru source code is run. This is only partially true; it is among first modules to be run, but it is definitely not the first one to be run. The main reason is that the naru runtime is very complex, and typically divided into lots of modules. (By the way, the internal pieces of runtime are due to be located at _narurt
package.)
The prelude (_narurt prelude
module) is a part of the naru runtime that bootstraps basic types and initializes built-in operators. Due to its duty, it is automatically imported by import naru core *
statement and also the very first module to be run. Since no value is available before the prelude runs, the prelude constructs the very first values during its lifetime; it is possible using the vendor-specific pragma or the external module. These values are truly ethereal (i.e. exempted or invisible from the garbage collector) and cannot be destroyed.
The prelude constructs the following types:
Top
It is very important to realize that naru does not have operators like +
, *
and so on by default. In fact, the core language only supports the function call, and all operators are desugared into sequences of more primitive function calls. For example, the following code:
quadraticRootOf(a, b, c) := fun
det = b**2 - 4*a*c
if det < 0 then
return {/} -- no real solutions
elseif det > 0 then
rootdet = det**(1/2)
return {(-b+rootdet)/(2*a), (-b-rootdet)/(2*a)}
else
return {-b/(2*a)}
end
end
…is really desugared into the following form:
quadraticRootOf##FSSS_S(a:*, b:*, c:*)->* := fun
det:* = #-#(#**#(b, 2), #*#(#*#(4, a), c))
given #<#(det, 0)
case True
set0##V:Set{*} = Set##M _empty()
return set0##V
case False
given #>#(det, 0)
case True
rootdet:* = #**#(det, #/#(1, 2))
set1##V:Set{*} = Set##TS_ _singleton(#/#(#+#(-#(b), rootdet), #*#(2, a)))
set1##V _add(#/#(#-#(-#(b), rootdet), #*#(2, a)))
return set1##V
case False
set2##V:Set{*} = Set##TS_ _singleton(#/#(-#(b), #*#(2, a)))
return set2##V
end
end
end
While hidden to the typical naru code, the memory management in naru is fully implemented in naru itself.
TODO
_narurt string
implements the String
class and its companion, Symbol
class. It is by far the most complex data structure in naru standard library (compared to its conceptual simplicity).
Internally the string and symbol share the same internal data structure, _StringTree
; only the public interface differs between them. As the name suggests it is a heavyweight string (rope), implemented as a tree of short strings. Each node stores the length of that portion of string along with its children, so the indexing is O(log n) time (compared to the constant time for normal string). The same portion of the tree is shared, even when the explicit copy is made; when the change is made to the copy new tree is constructed so that as much portions of the original tree as possible is shared (i.e. copy-on-write scheme). TODO pointers to the next or previous leaves can be used if we split the leaf node and actual storage, at the expense of a copy of the entire tree (without expansion of shared parts!); i doubt its value however.
A rope is used because most disadvantages of the rope, namely slower indexing, are not a concern for most use cases. The direct indexing on a Unicode string is not useful since a Unicode code point is not same to a character; even when the code point is needed, most use cases requires the iteration of the whole string which is just as efficient as a normal string. It also has an advantage that UTF-8-encoded string can be stored as is; the verification is still required, but it is much faster than the full decoding. Finally, various algorithms on Unicode strings can be optimized using a rope.
Nodes. There are four kinds of leaves in the _StringTree
:
_FlatString1
, an array of one-byte code points. Can only represent U+0000..U+00FF, and directly maps to bytes encoded in ISO 8859-1. The empty string is a special case of this leaf._FlatString2
, an array of two-byte code points. Can only represent U+0000..U+FFFF, and directly maps to bytes encoded in UCS-2 with a native byte order._FlatString4
, an array of four-byte code points. Directly maps to bytes encoded in UTF-32 with a native byte order._FlatUTF8
, a UTF-8-encoded bytes. The bytes should be a valid and complete UTF-8 string.There are two kinds of non-leaf nodes in the _StringTree
:
_ConcatString
is a concatenation of two strings. Repetition is represented as several concatenations, with a depth proportional to a logarithm of the repeating number._Substring
is a substring of the other string.Additional node informations. Each node contains the following informations. Some informations can be uninitialized on the construction of the string, and will be recalculated in-place during the traversal.
_ConcatString
nodes.naru core