naru programming language, in a nutshell (supposedly)

« List of documents

This is a draft tutorial and design document for naru programming language, a general-purpose programming language designed by Kang Seonghoon. This article assumes the development version of naru, revision 4, and thus is subject to change.

This document is annotated in the separate rationale document, and a link to the justification or rationale will appear like this. (Note: these links won’t display properly in the wiki version.)

This document is available as both a standalone document and a wiki page. The recent changes are available in Mercurial repository. 이 문서의 한국어판은 여기서 볼 수 있습니다.

The Basics

Let’s start with a simple example.

use naru
"hello world!" println()

use naru is a (optional) marker to signal a naru code. "hello world!" is a string literal, and println() method prints given value. Normally newline ends the statement, but you may use ; to separate them manually.

naru is an imperative programming language; you can assign and use variables much like most other languages.

nirvana = ()                   -- "null" value
sunny = true                   -- boolean
radius = 42                    -- integer
almostpi = 3.1415926535897932  -- real number
something = 4 + 5j             -- complex number (with a trailing 'j')
myname = 'Kang Seonghoon'      -- string (well, quotes does not matter here)
haiku = """
 古池や  ふるいけや
 蛙飛込む かわずとびこむ
 水の音  みずのおと
"""                            -- multiline string using triple quotes

Here -- introduces a short comment that ends in front of newline; comments are ignored silently. There is also a long comment delimited by (-- and --). By convention we will illustrate the output of the preceding line after -->.

The last statement shows the example of multiline string, which should be delimited by triple quotes or newlines have to be escaped otherwise. It also shows the use of Unicode in naru: every naru code is read as Unicode text.

There are several operations built in naru; among them are arithmetic operations:

(15 + 21) println()       --> 36
(15 - 21) println()       --> -6
(15 * 21) println()       --> 315
(15 / 21) println()       --> 5/7

Wait, a fraction? Yes, naru has a built-in fraction. Dividing an integer by other integer may yield a fraction. naru also has separate operators for quotient (i.e. floor division) and remainder:

(49 // 21) println()      --> 2
(49 % 21) println()       --> 7

If you really want to see a real number, make either the divisor or dividend a real number. Anything contains a decimal (.) is assumed to be a real number. Also Real from(...) will convert an integer into a real number.

(49.0 / 21) println()            --> 2.333333333333
(Real from(49) / 21) println()   --> 2.333333333333

Integers in naru have arbitrary precision, and are only bounded by the memory limit.

(2 ** 88) println()       --> 309485009821345068724781056

For strings naru provides a concatenation operator, but it has a somewhat strict idea about what can be concatenated.

("Hello, " ~ "world!") println()                --> Hello, world!
("pi is about " ~ 3.14) println()               -- error
("pi is about " ~ String from(3.14)) println()  --> pi is about 3.14

Since it is common to concatenate lots of non-string things naru has a shortcut: an interpolated string.

pi = 3.1415296535897932
#"pi is about #(22/7), and almost equals to #pi." println()
    --> pi is about 22/7, and almost equals to 3.1415296535897932.

Often we have to make a decision, and naru has a conditional construct for exactly this purpose. Comparison can be made as usual. (Just keep in mind that = is an assignment and == is an equality; = in the condition is forbidden.)

age = 15
if age == 0 then
    "Welcome to the cruel world!" println()
elseif age < 13 then
    "You are a child." println() 
elseif age <= 18 then
    "You are a teenager." println()
else
    "You are an adult." println()
end

elseif and else parts are not mandatory, and you may drop them if not needed. So does then keyword, unless it is followed by other statement.

age = 15
if 13 <= age <= 18
    "You are a teenager." println()
end

Here 13 <= age <= 18 has a usual meaning; it returns true when age is between 13 and 18, inclusive. It won’t be parsed as (13 <= age) <= 18 or similar. The following is also equivalent:

age = 15
if 13 <= age <= 18 then "You are a teenager." println() end

There is also a then-else expression which can be used for an inline conditions:

(13 <= age <= 18 then "You are a teenager." else "You are a stranger.") println()

This expression should be enclosed in the parentheses (which include ones in the function call), since otherwise it can be mistaken for if-then statement.

Before talking about loops, let me introduce several kinds of complex values that are constructed from other simpler values. They may look familiar to you, but I remind the characteristics of them one more time:

The good analogy would be the address book in your cellphone: you store each entry as a tuple, which are stored in the map in order to speed the lookup (since you normally find the entry by the name), and entries that are recently called are also stored in the set (if you don’t care about the order) or the vector.

today = (2010, "Nov", 6)             -- tuple
numbers = [0, 1, 2, 2, 3, 3, 3, 3]   -- vector
genders = {"male", "female"}         -- set
countries = {                        -- map
    "KR" => "Republic of Korea",
    "JP" => "Japan",
    "US" => "United of States",      -- look at this stray comma!
}

As in the last statement every newline between a pair of matching parentheses, braces or so is ignored. Also a stray comma will be ignored, so it is handy to write a long vector, for example.

A loop builds from a corresponding complex value. For example, you may want to loop from values of numbers so print them:

for num <- numbers do
    num println()
end

You will get a pair of key and value if you loop from the map. (Also note that do is optional, as like then in the conditional.)

for code, name <- countries
    code println()
    name println()
end

“It is okay that every loop requires a complex value, but what if I want to write numbers from 0 to a billion?” No problem, you have two ways to do it:

-- one way
for num <- 0..1_000_000_000
    num println()
end

-- another way
num = 0
while num <= 1_000_000_000
    num println()
end

An underscore (_) within a long number is ignored. The second way shows a while loop, the most general form of loop, but it is quite cumbersome to write. Use it only when there is no other way. (For paranoids: 0..1_000_000_000 does not store all the numbers between them, so it is safe to use lots of them.)

Structuring

The basic naru above is just fine if you use naru as a calculator, but sometimes we want some kind of structure in the program. A major building block in naru is a function.

double := fun (x)
    return x * 2
end

You can see a new operator here: :=. It is called binding, compared to assignment (=), and it means double won’t change whatsoever. It is not limited to functions; you may use := to declare constants.

PI := 3.1415926535897932

It is not required but recommended that every constant name is written in uppercase. Likewise a function name does not start with a uppercase letter to distinguish itself with constants.

Back to the function, fun ... end makes a new function. In this case it has one argument, x, and it will return x times 2 as expected. Since this function is so short you can use a shortcut:

double := fun (x) -> x * 2

This seems to mean that double itself is not a function, but it contains a function value. It really does, but in most cases we don’t care and call it just a function double. The alternative syntax supports this intuition:

double(x) := fun
    return x * 2
end

Or more shorter:

double(x) := fun -> x * 2

To distinguish this uniform declaration syntax and the function value, fun should be followed by a parenthesized argument list for function values and nothing for declaration. We will look at lots of variations on this syntax later.

Functions can recurse into itself…

fibonacci(i) := fun
    if i == 1 then return 1
    elseif i == 2 then return 1
    else return fibonacci(i-1) + fibonacci(i-2)
    end
end
for i <- 1..10
    #"fib(#i) = #(fib(i))" println()
end

Or can be passed to the other function…

greeting(msg, decoration) := fun
    decoration(msg) println()
end

nope(msg) := fun -> msg
squared(msg) := fun -> "[" ~ msg ~ "]"

greeting("Hello, world!", nope)        --> Hello, world!
greeting("Konnichiwa!", squared)       --> [Konnichiwa!]

Or even can be returned by the other function.

twice(f) := fun
    return fun (x) -> f(f(x))
end
fourtimes := twice(twice)
sixteentimes := fourtimes(fourtimes)
sixteentimes(double)(1) println()      --> 65536

As you can see fun ... end thing is not required to appear right after :=. And while the code fun (x) -> f(f(x)) itself is fixed, the resulting function will be depend on f and will vary even outside of the defining function. It is called lexical scoping in PL jargon. (See also the difference of function declaration and function value?)

You may mark some arguments optional, by providing default values for them. It is a good idea to provide reasonable defaults for rarely used arguments.

splitstring(string, delim = " \t\r\n", preservedelims = false) := fun
    -- ...
end

Like a difference between assignment and binding, = and := have different meanings in default values; the former evaluates the default argument before the actual call (but not in the context of the call site, of course) and the latter evaluates it after the definition. This distinction will be helpful when the mutable value is used as a default value, but for now, just think that := is only for special cases.

If you want to specify some (but not all) default arguments, you may use the keyword arguments in the call site.

splitstring("alpha beta", preservedelims: true) println()

You are not limited to optional arguments however.

splitstring(preservedelims: true, string: "alpha beta") println()

You may also declare a varadic-argument function by appending ... to the last argument. This last argument will receive a tuple of remaining arguments.

printf(fmt, args...) := fun
    -- ...
end

Sometimes you need arguments that can only be specified using keywords; for example you want to add an outfile argument to printf but not by breaking the backward compatibility. In that case you can append them after the varadic argument:

printf(fmt, args..., outfile = Console) := fun
    -- ...
end

For consistency you may omit the name of varadic argument, which means it does not receive varadic arguments at all. (In this view you may think that , ... is implied for every non-varadic functions.)

splistring(string, delim = " \t\r\n", ..., preservedelims = false) := fun
    -- ...
end

In the reverse direction, you may provide multiple arguments to the function by using \ prefix (tuples will be explained later):

args = (99, "bottles of beer")
printf("%d %s in the wall\n", \args)

This “flattening” is possible in the middle of arguments, not only at the end. This feature is useful for redirecting function arguments to others.

Class, Classy, Classic

naru supports some object-oriented programming (OOP). However we should be extra careful about the word object-oriented, since it is often vague and may imply several other features of which naru does not support (and is not willing to support). In fact I have not use the word object at all; instead I prefer the word value as it would not imply OOP, at least.

Anyway let’s move on. In naru you can define your own class:

Person(name, age) := class
    var name
    var birthday = Date today - Duration days(floor(age * 365))

    teenager := _
    teenager get() := fun -> 13 <= age <= 18

    greet(lang='en') := fun
        if lang == 'en'
            #"Hello, #(self name)!" println()
        elseif lang == 'ja'
            #"Konnichiwa #(self name)-san." println()
        else
            "..." println()
        end
    end

    self species := "Homo sapiens sapiens"

    var self population = 0
    self exterminate() := fun
        self population = 0
    end
end

hacker = Person('J. Random Hacker', 93/4)
Person population += 1
hacker species println()      --> Homo sapiens sapiens
floor(hacker age) println()   --> 23
hacker teenager println()     --> false
hacker greet()                --> Hello, J. Random Hacker!
hacker name = "Ichiro"
hacker greet('ja')            --> Konnichiwa Ichiro-san.
Person exterminate()
Person population println()   --> 0

Every class can have the following elements:

Zero or more constructor arguments

They are written after class keyword (or after the name in the uniform declaration syntax); in this case Person has two constructor arguments, name and age. The call to the class will create a new value with given arguments. They are accessible inside and outside the class with the normal attribute syntax (i.e. value attr).

Optional one or more base class

They follows class keyword and constructor arguments (if any). They can be either concrete (i.e. class) or abstract (i.e. trait), and concrete base classes should be followed by corresponding constructor arguments.

Zero or more mutable attributes

They are written using var keyword (this is also valid outside the class, but with slightly different meaning). While constructor arguments are read-only by default, mutable attributes can be modified inside and outside the class with the attribute setter syntax (i.e. value attr = newvalue). If constructor argument and mutable attribute coincides, they are shared.

Zero or more immutable attributes

They are similar to mutable attributes but var does not precede them. They become immutable once defined. TODO is there a difference between x = ... and x := ... here?

Initial values for attributes

They are optional for mutable attributes that are shared with constructor arguments, and mandatory for other cases. They are evaluated when the constructor is called at the first time, when even self is not available yet; this is why you should use age instead of self age in the initial values. (To clarify: By the way, you may use a plain age in the method, but it won’t reflect the changes in the mutable attribute of the same name if any. Think a plain age as arguments as in the function.)

It is possible that initial values are not specified, using a placeholder _ in place of initial values. It means that the subsequent subclasses should fill that default value somehow, and any concrete class that does not supply all required default value is invalid. Placeholders can also filled by submethods.

Zero or more methods

They are mostly same as a normal function definition using a uniform declaration, but self is automatically available to the method body. Non-uniform declarations can be used to define methods as well, but self won’t be added automatically in this case.

You may omit the method body using a placeholder _ just like attributes; it forces any concrete subclasses or submethods supply the actual method body.

Zero or more subattributes and submethods

Subattributes and submethods are additional informations for attributes and methods. They are defined as like other attributes or methods, but the name of a target attribute or method are prepended (e.g. greet doc := ...). There is no inherent difference between subattributes and submethods, except for the definition syntax (subattributes are defined like attributes, submethods are defined like methods).

They can be used freely, but some subattributes and submethods are specifically recognized by the compiler:

  • doc subattribute adds an inline documentation to the target.
  • get submethod gets executed when the placeholder _ is used to define the target and the access to that target is made. Therefore when SomeClass attr is defined with a placeholder, SomeClass attr get() (if any) will be called instead. It is also applicable to the methods; in this case it should return a function(!).
  • set submethod is similar to get, but gets executed when the target is being written with = operator. It should receive one argument (the return value is slightly discarded). Other kinds of assignments, besides :=, are automatically translated to the assignment with =.
  • bind submethod is similar to set, but gets executed when the target is being written with := operator.
Class attributes, methods, subattributes and submethods

These are same as attributes, methods, subattributes and submethods belonging to the instance (i.e. a value constructed from the class), but prefixed by the additional self. They can be also accessed from their instances. For the class methods and submethods, self inside them denotes the class value itself no matter where the method is called. (For the same reason, constructor arguments are not available in the class-bound ones.)

There is no strict concept of visibility in naru; you may prepend _ to the private name (_debug, for example) but it won’t stop users from accessing them. However prepending _ itself is a good idea, as it will note users that the interface may change without a notice.

While the class definition looks similar to the function definition, the class definition cannot have arbitrary statements; if you want to do something complex in the constructor, define initialize method:

NonNegative(value) := class
    initialize() := fun
        if value < 0 then
            #"Warning: #value is less than zero." println()
        end
        return self
    end
end

NonNegative(4) println()       --> <NonNegative @b0d3a0>
NonNegative(-4)                --> Warning: -4 is less than zero.

Note the use of value here; since we are talking about the constructor argument we don’t have to use self value. In fact, the following classes are equivalent:

Date1(year, month, day) := class
    -- redundant initializations are kept for the comparison purpose.
    var year = year
    var month = month
    var day = day
end

Date2(year, month, day) := class
    var year
    var month
    var day

    initialize := fun
        self year = year
        self month = month
        self day = day
    end
end

Since we often want constructor arguments to be mutable attributes automatically, you may merge them like this:

Date(var year, var month, var day) := class
end

Inheritance

Class may be based on the other existing class:

Date(year, month, day) := class
    weekday() := fun
        leapyear(y) := fun -> y % 4 == 0 and (y % 100 != 0 or y % 400 == 0)

        years = self year - 1
        days = 365 * years + years // 4 - years // 100 + years // 400
        if leapyear(self year) then
            days += [0, 3, 3, 6, 1, 4, 6, 2, 5, 0, 3, 5][self month - 1]
        else
            days += [0, 3, 4, 0, 2, 5, 0, 3, 6, 1, 4, 6][self month - 1]
        end
        weekday = (days + self day - 1) % 7
        return ["Mon", "Tue", "Thur", "Wed", "Fri", "Sat", "Sun"][weekday]
    end
end

DateTime(year, month, day, hour, minute, second) := class <- Date(year, month, day)
end

DateTime(2010, 11, 7, 18, 40, 2) weekday()      --> Sun

Here a value DateTime(y,m,d,h,n,s) is equivalent to Date(y,m,d) when the method in Date is used. It is possible that DateTime weekday is declared and overrides the definition of Date weekday; in that case, DateTime weekday can use super weekday() to call the superclass’ weekday method.

There can be two or more base classes; if omitted, as in Date class above, then it is assumed to have Thing as a base class.

TODO

Abstract Class

Sometimes it is convenient to assume a class prototype that is never instantiated. For example, when you have Circle, Square and Rectangle classes the Square is clearly a subclass of Rectangle but Circle and Rectangle should share some kind of base class for, say, area method. Therefore naru supports an abstract class, or traits.

Shape := trait
    circumference := _
    area := _

    -- most shapes are not rotationally symmetric...
    rotational_symmetry := false
end

PI := 3.1415926535

Circle(radius) := class <- Shape
    circumference := 2 * PI * radius
    area := PI * radius**2
    rotational_symmetry := true
end

Rectangle(width, height) := class <- Shape
    circumference := 2 * (width + height)
    area := width * height
end

Square(side) := class <- Rectangle(side, side)
    rotational_symmetry := true
end

Traits are very similar to classes, but they lack constructor arguments. Traits may contain the concrete parts, to the extent that you can make a trait with everything is concrete so a subclass without body is valid. Traits are general choice for abstract classes and complex hierarchy, since they have no difficulty when dealing with the multiple inheritance.

Thing, which is automatically used for classes without a base class specification, is a good example of traits. It nevertheless exposes several useful methods.

Polymorphism

In PL jargon, polymorphism is just an ability to generalize several things into one thing. For example you can generalize classes by using a base class; this is called subtype polymorphism. Another kind of polymorphism is deriving several similar classes from one template with parameters (typically types); this is called parametric polymorphism. naru supports both of them, but this is quite difficult matter to explain all the bit here.

(For paranoids: Yes, there is the third kind of polymorphism, ad-hoc polymorphism, which best example is C++’s overloading. For the purpose of this document, however, it is not considered polymorphism as it does not define one entity which behaves differently depending on the context; it instead defines multiple entities sharing one name. naru supports this notion nevertheless, using the union types.)

Multiple Inheritance

As a part of subtype polymorphism naru supports multiple inheritance. C++ folks should be scared here, but naru is more robust than C++ about multiple inheritance. First look at this example:

A(x, y) := class
end

B(x, y, z) := class <- A(x, y)
end

C(y, x) := class <- A(x, y)
end

D(x, y, z) := class <- B(x, y, z), C(y, x)
end

Then D(x, y, z) is equivalent to B(x, y, z), C(y, x) and A(x, y) when using methods from base classes. Base classes cannot conflict each other; for example, it is error to change the definition of C to C(x, y) := class <- A(x, x) as it will imply x == y in D’s constructor. (You cannot use the arbitrary argument in base classes for the same reason.) Traits are free of this problem as they lack the constructor arguments.

Every conflicting attributes, subatributes or methods are reported:

A := trait
    f() := _
end

B := trait <- A
    f() := fun -> 42
end

C := trait <- A
    f() := fun -> "some string"
end

D() := class <- B, C
    -- error: f() is not defined here, so `B f` and `C f` will conflict
end

This diagnostics are applied equally to the types of attributes, methods, subattributes or submethods; if B f and C f narrow down the type of A f, then the type of D f should be narrowed down so that it is a subtype of types for both B f and C f. (See an intersection type for detailed rules.)

TODO overriding rules; this is too complex and has lots of corner cases. heck.

Static Parameters

naru also supports parametric polymorphism, by using {} before the constructor arguments: (For now, ignore unseen syntaxes here.)

LinkedList{+T}(head: T, tail: LinkedList{T}? = none) := class
    print() := fun
        self head print()
        if next <- self tail then
            ", " print()
            next print()
        end
    end

    global String from(self) := fun
        return "<LinkedList of type #T>"
    end
end

LinkedList(3, LinkedList(4, LinkedList(5))) print()    --> 3, 4, 5

Here {+T} is a type parameter; it would be {Key, Value} for the maps, for instance. But what about +? It means you can use LinkedList{Int} in place of LinkedList{Rational}, for example. Since Int would fit in all cases Rational is needed, it makes sense that LinkedList{Int} and LinkedList{Rational} behaves equally. If you use - it means the inverse; LinkedList{Rational} can be used in place of LinkedList{Int}. If there is no such marker, there are no such relations at all. In PL jargon: + means the type parameter is covariant, - means it is contravariant, and no marker means it is invariant.

In many cases you don’t have to figure out + and - (they are incorrect in many cases), but it does matter when you are writing very general class, like collections. You will certainly want the list of Int to be used in place of the list of Rational, right? However you should be careful, since mutable containers cannot be covariant. (In brief, getting the element requires a covariant type parameter for values and setting the element requires a contravariant one. In order to satisfy both it should be invariant.)

Using classes with type parameters is much easier than defining them. The last line in the code above is equivalent to this:

LinkedList{Int}(3, LinkedList{Int}(4, LinkedList{Int}(5))) print()

But since we already know that head receives Int and thus T has to be Int, we can omit all {Int} here; you still have to give all type parameters if one or more type parameters cannot be inferred automatically, such as an empty linked list. (One alternative is to use type keyword, as we will discuss later.)

Since T is covariant, the following lines are equivalent:

LinkedList(4.0, LinkedList(3/4, LinkedList(5))) print()
LinkedList{Real}(4.0, LinkedList{Rational}(3/4, LinkedList{Int}(5))) print()
-- here LinkedList{Rational}(...) is used in place of LinkedList{Real}

For contravariant T the similar equivalence will hold. In short, the type inference is affected by the variance of type parameters.

Type parameters in the class definition are treated equally to constructor arguments; you can use T (or self T) throughout the class. However you cannot declare it as a mutable attribute (of course!).

TODO type parameters on functions; in this case we may automatically infer + and - but it will be inconsistent with ones on classes…

TODO restrictions on type parameters (maybe {T <- U} or so)

Handling Errors

In most cases we have to deal with lots of possible errors. When we divide some number by the another number we may get the error (division by zero), when we open the file and read some lines we may get other errors (no file, permission denied, EOF reached, and even interrupted), and so on. naru supports exceptions to deal with them.

In naru exceptions are values of Exception class, and can be raised by the raise statement.

raise RuntimeError('alien invaded!')

They can be caught using a special but block:

do
    raise RuntimeError('alien invaded')
but Exception as e
    #"exception caught: #(e message)" println()
end

More generally, every do and fun block can have but, else and finally blocks which are executed when an exception of given type (if the type is given; every exception if not) is caught, when no exception is caught, and by the time when the execution flow escapes the block, no matter if the exception is caught:

do
    -- ...
but ExceptionA as e
    -- executed when ExceptionA or its subexception is caught
    -- a new variable e is available in the scope
but ExceptionB
    -- executed when ExceptionB or its subexception is caught
    -- unless the exception was of ExceptionA class too
but
    -- executed when an exception not of ExceptionA nor ExceptionB class is caught
else
    -- executed when no exception is caught
finally
    -- executed after `but` or `else` block got executed (if any)
    -- runs in most cases, but may not execute when the program got the signal
end

In the but block one may use a bare raise statement to re-throw given exception. (This exception wouldn’t be caught by the subsequent but block in the same block, of course.)

There is a shortcut to the big but block; you may use but expression when you have only one exception to handle. (While you can technically use ((... but ...) but ...) but ... form, it is too verbose and do block is better.)

(4/0 but ZeroDivisionError -> -1) println()                     --> -1
(4/0 but ZeroDivisionError as e -> e length sign) println()     --> 1
(4/0 but 42) println()                                          --> 42

with Blocks

One variation on the but block comes from the common resource management pattern. Let’s assume we have to write some file. The naive code would look like this:

f = File("forty-two") openrw()
"The answer to the life, universe and everything." println(f)

The more subject on the file I/O is available in the later chapter. For now we will use them without explanation. But are you sure that this code is correct? Should it be like this one?

f = File("forty-two") openrw()
"The answer to the life, universe and everything." println(f)
f close()        -- MISSING!

In general we should close the file when we opened it. Of course it often gets automatically closed by the finalizer if the garbage collector is in the heavy duty, but we have to be sure…. Wait. What if we get interrupted while writing to the file?

f = File("forty-two") openrw()
do
    "The answer to the life, universe and everything." println(f)
finally
    f close()
end

This code will always close the file regardless of the exception, but is also very verbose. Since this pattern is common, naru provides a shortcut: scoped with block.

with f <- File("forty-two") openrw()
    "The answer to the life, universe and everything." println(f)
end

This with block automatically creates the placeholder code with do-finally block. For the sake of generality it calls enter() method at the start of the block body and exit() at the end of the block body. (For File values, exit method is a synonym for close.) Furthermore you can use other but, else and finally(!) blocks in it:

with f <- File("forty-two") openrw()
    "The answer to the life, universe and everything." println(f)
but Exception as e
    -- this block will also get executed when File(...) call fails.
    #"exception caught while writing the file: #(e message)" println()
finally
    -- this block executes before the exit method.
    "the file is about to be closed!" println()
end

One final nicety for using with block is that the scoped variable is automatically removed after the block. This reduces the confusion due to the duplicated variables.

f = 42
with f <- File("forty-two") openrw()
    "The answer to the life, universe and everything." println(f)
end
f println()                      -- 42

Special Names and Complication

Someone may wonder that we have seen various built-in values but we don’t know about their class. In fact, they do have their classes; you can look up their classes by class attribute. (It is one of a few keywords permitted in the attribute syntax.)

() class println()                 --> <type ()>
true class println()               --> <type True>
false class println()              --> <type False>
42 class println()                 --> <type SmallInt>
(4/7) class println()              --> <type Ratio>
3.14 class println()               --> <type Decimal>
(3 + 4j) class println()           --> <type CDecimal>
?x class println()                 --> <type Char>
"string" class println()           --> <type String>
[1, 2, 3] class println()          --> <type Vector{Int}>
{1, 2, 3} class println()          --> <type Set{Int}>
{0=>'F', 1=>'T'} class println()   --> <type Map{Int, String}>
(3, false, 'foo') class println()  --> <type (Int, False, String)>
42 println class println()         --> <type ()->()>
abs class println()                --> <type (Int)->Int | (Rational)->Rational | (Real)->Real | (Complex)->Complex>

You can see some unseen names like SmallInt, Ratio, Decimal and CDecimal instead of more familiar Int, Rational, Real and Complex. This is because that’s how they are internally implemented; see the detailed hierarchy for explanation. In the other hands the last example shows several overloaded function types, but we will ignore them for now.

Another question arises here; how about operators like +, * and ~? In fact, they are normal functions and you can declare them with a special name: #+#, #*# and #~#.

Gaussian(re, im) := class <- CDecimal(re, im)
    initialize() := fun
        if re != floor(re) or im != floor(im)
            raise ValueError("Gaussian number requires integer arguments.")
        end
        self
    end

    global #**#(self, other:Int)->Gaussian := fun
        if other < 0 then
            return 1 / self ** (-other)
        elseif other > 0 then
            -- binary exponentiation
            half = self ** (other // 2)
            if other % 2 == 0 then
                return half * half
            else
                return half * half * self
            end
        else
            return 1
        end
    end
end

As well as #**#, you can see several new syntaxes here:

Therefore, the following expression…

Gaussian(3, 4) ** 5

…is equivalent to the following…

#**#(Gaussian(3, 4), 5)

…and it will call #**# declared in the Gaussian class, as expected. Likewise most operators and syntaxes in naru can be rewritten in functional form:

Operator Functional
op b b op#()
a op a #op()
a op b #op#(a, b)
a op= b a = a #op=#(b)
a op1 b op2 a #op1#op2(b)
a op1 b op2 = c a #op1#op2=#(b, c)
a op1 b op2 := c a #op1#op2:=#(b, c)

For example, the function call a(b) is equivalent to a #(#)(b). op= case is a bit important as you have to return self (or the new value, if your class is immutable) or you will get the error. Also you cannot override #{#}=#, #{#}:=#, #(#)=# and #(#):=# due to the syntactic ambiguity.

Type Hierarchy

Since everything in naru (and even a class) is a value, and since every value has class pseudo-attribute, everything in naru is of some type. (naru does distinguish two terms, “type” and “class”, but at least for this section we will use the term “type” to denote a value returned by class attribute.) Types in naru form the full hierarchy as like this:

Value
|- Function
|  +- ...                (includes Function1{-T1,+U}, Function2{-T1,-T2,+U} etc.)
|- Tuple
|  +- ...                (includes... TODO)
+- Thing
   |- Type
   |- Ordered
   |  |- Bool
   |  |  |- False
   |  |  +- True
   |  |- Number
   |  |  +- ...          (includes Complex, Real, Rational, Int etc.)
   |  |- Char
   |  |- Date
   |  |  +- DateTime
   |  |- Time
   |  |  +- DateTime
   |  |- Week
   |  +- Duration
   |- Enumerable{+T}
   |  +- ...             (includes Range{+T}, Vector{T}, String etc.)
   |- Stream
   |  |- Reader
   |  |  +- Console
   |  +- Writer
   |     +- Console
   |- File
   +- BaseException
      +- Exception

Basically all value is of Value class, and all “normal” values besides from functions and tuples are of Thing class. Every ordered (more exactly, partially ordered) values are of Ordered class. Every exceptions are subclasses of BaseException class, but since some built-in exceptions are used to implement language-specific situations (such as SystemExit) you’d better use Exception instead.

Number and Enumerable have big hierarchies themselves, which are omitted here; we will look at them later.

Implicit Conversion

TODO TypeName from multimethod

Tuples

A tuple in naru contains the fixed number of values with different types. Tuples are very similar to immutable collections, but collections in principle contain arbitrary number of values with a same type so these two are of different uses. This can be also verified from the types of tuples and collections: a tuple of three numbers have a type with exactly three types within it ((Number,Number,Number)) but a vector doesn’t (Vector{Number}).

Tuples are constructed by comma operator: 1, 2, 3 or (1, 2, 3). Parentheses are optional but sometimes required to resolve an ambiguity (e.g. f(3, 4, 5) is a function call with three arguments but f((3, 4, 5)) is a function call with one tuple argument). Assignment operator (= etc.) and return statement, among others, are bound more loosely than comma, so you can use the code like a = 3, 4, 5 (same for a = (3, 4, 5)) and return 3, 4, 5 (same for return (3, 4, 5)).

0-tuple, or a degenerate tuple with no values, is written (). It is a default return value for functions without return statements. There are no 1-tuple unlike many other languages, including Python.

Tuple unpacking. Tuple syntax can also be used in the left hand side of assignment, meaning the right hand side should be a tuple (or similar things; will be explained later) and multiple assignments will be occurred. For example:

a = 3, 4, 5
b, c, d = a
#"b=#b c=#c d=#d" println()      -- b=3 c=4 d=5
b2, c2 = a                       -- error!

0-tuple, (), can be used in this way too. In this case the right hand side should be (), which is useful for verifying the function call not to return any (unexpected) meaningful value—using a function call as a statement does not check this, throwing out any return value.

() = foo()

To ignore one or more values in the tuple use an _ placeholder identifier in place of variables. A special identifier _ is reserved in naru, so variables cannot use that name.

first := fun (t); a, _, _ = t; return a; end
second := fun (t); _, b, _ = t; return b; end
third := fun (t); _, _, c = t; return c; end
first((3, 4, 5)) println()       -- 3

You can also use an ellipsis (...) to get the remaining part of values in the tuple; this is alike a function call.

a, b, c... = 3, 4, 5, 6, 7     -- c will get (5, 6, 7)
d, e, f... = 3, 4, 5           -- f will get 5
g, h, i... = 3, 4              -- i will get ()

-- 1-tuple (i.e. non-tuple) can be unpacked as well.
x, y... = 1                    -- x=1, y=()

_ and ... can be used altogether, so the above first, second and third can be rewritten in the generalized version (do not require 3-tuple):

first := fun (t); a, _... = t; return a; end
second := fun (t); _, b, _... = t; return b; end
third := fun (t); _, _, c, _... = t; return c; end
first(3) println()             -- 3
first((3, 4)) println()        -- 3
first((3, 4, 5)) println()     -- 3
third((1, 2)) println()        -- error (of course)

Individual item access. Tuples have a set of special attributes, namely from #0 to #n where n is a number of values in the tuple minus 1. These are a shortcut for aforementioned first, second and third functions.

-- this code...
c, b, a = t

-- ...is equivalent to this (except for type checking):
c = t #0
b = t #1
a = t #2

For the uniformity #0 also works for non-tuples; therefore for any non-tuple value v, v #0 and v is exactly equivalent. There is also a tuplelength attribute which gives the number of values in that tuple, or 1 when it is not a tuple (also for the uniformity).

Other tuple operations. A concatenation operator, ~, also works for tuples:

((1, 2, 3) ~ (4, 5, 6)) println()      -- (1, 2, 3, 4, 5, 6)

A flattening prefix, \, can also be used in the tuple constructor:

a = (2, 3)
(1, \a, 4) println()       -- (1, 2, 3, 4)

They are quite picky about operand types, however, so one has to be careful when using the static types. For example, this code won’t compile:

f(t, u) := fun
    use static         -- ...so type inference will be used throughout this function

    a, b, c... = t
    -- t should have a type of (Ta, Tb, \Tc)
    d, e = u
    -- u should have a type of (Td, Te)

    -- u ~ t == (d, e, a, b, \c), and has a type of (Td, Te, Ta, Tb, \Tc)
    (u ~ t) println()
    -- t ~ u == (a, b, \c, d, e) but \c is invalid since c's length is not fixed!
    (t ~ u) println()              -- error!
end

Boolean Values

There are two truth values (“Boolean”) in naru: true and false. If needed, the third truth value, “maybe” or “unknown”, can be simulated using Interval{Bool} and Option{Bool} respectively. When the boolean value is required as a condition in naru, the given value will be automatically converted to the booleans using Bool from conversion method. (This, of course, does not mean that boolean function arguments automatically converts the values.)

naru contains two sets of boolean operators: one is short-circuiting and the another is not. The former is named and and or; the latter is named && and ||. Well, there is also a unary operator not for usual purpose. Anyway the former variant receives any values (not only booleans) and converts the first operand to determine whether to evaluate the second operand or not; as a result, and returns the first non-true value (or the last value if none) and or returns the first non-false value (or the last value if none). The latter variant also receives any values, but always evaluates and converts operands to booleans to get the result; therefore their return value will be always boolean.

(3 and 0 and false) println()            -- 0
(3 and false and 0) println()            -- false
(3 and 4 and "!!!") println()            -- !!!

(() or "hello" or "world") println()     -- hello
(() or "" or 0.0) println()              -- 0.0

(0 && "foo") println()                   -- false
(0 || "foo") println()                   -- true

The built-in values in naru typically converts to true, but some values converts to false instead:

Comparison and Ordered Types

There are four kinds of comparison possible in naru: the value equality (== and !=), the referential equality (is and is not), the ordering (<, <=, > and >=) and the rich ordering (<=>).

Value Equality. These operators compare two given values using the common sense, so 3 == 3 and 3 != 4. Equality operators are available in not only numbers but every values, and their default behavior is same to the reference equality. This means that a == a holds for almost every a except for a few cases: one prime exception is a Not-a-Number (NaN) value in the floating point number, where NaN differs from any NaN.

While these operators are intended to implement the common sense, it is possible that two equal values a and b do not compare identically. It is quite possible when the values are mutable (like opened files), or when the exact comparison is very costly so only the approximation is implemented (like symbolic expressions).

Referential Equality. These operators returns true if and only if modifying one value in-place also modifies other value directly. For example, [1, 2, 3] == [1, 2, 3] (as two vectors contain same values) but [1, 2, 3] is not [1, 2, 3] (as modifying the former does not change the latter). For numbers the referential equality and the value equality are guaranteed to be identical, except for NaNs where two NaNs with the same representation will be considered referentially identical.

Ordering. This is a hard part. The general ordering is available to every values of Ordered class. It is highly recommended that a <= b is identical to (a < b) || (a == b) (and vice versa) and a < b is identical to b > a (and vice versa), but these recommendations are not enforced.

For the real numbers, these operators return false when at least one operands are NaN. (For paranoids: they are not signaling in the IEEE 754 terminology.) In order to check this strictly use :strict argument: #<#(3, Decimal NaN, :strict) will raise the exception, for example. (See with operator for the shortcut.) For the ordered collections (i.e. Vector and SortedSet are ordered but Set is not ordered) the comparison is done lexicographically: [3, 4] < [3, 4, 5] < [3, 5] < [4]. Of course the values themselves have to be ordered in order to use the comparison.

As mentioned earlier a <= b < c is not same to (a <= b) < c nor a <= (b < c). It is much more like (a <= b) && (b < c), except that b will be evaluated only once. This desugaring also means #<#, #<=#, #># and #>=# functions still receive exactly two arguments, no matter the comparison is chained or not.

Rich Ordering. Implementing #<#, #<=#, #># and #>=# separately is cumbersome and error-prone, so naru provides a single comparison operator <=>. This rich comparison operator works as if (a <=> b) op 0 is equivalent to a op b (where op is one of six valid normal comparison operators). By default ordering and equality operators are implemented using the rich ordering, so usually you just have to implement one function to implement the comparison.

<=> usually returns 0, -1 or +1 depending on the comparison result, but this is not always required. TODO no, <=> may have to behave as like IEEE 754’s totalOrder function (except for -0 and +0 distinction?)

Character Type

Character type in naru, Char, provides an abstraction on a conventional text. In order to support most languages and scripts uniformly, naru uses Unicode as a basis of text representation; therefore one Char value encodes one Unicode code point. A code point encodes a single abstract character, but this does not necessarily map to a conventional character (termed grapheme for clarity). It is quite common that one grapheme requires multiple code points to represent: for example, J with a caron, i.e. J̌, requires two code points, U+004A and U+030C. In naru the basic unit of text manipulation is still a code point as it is easy to implement and handle, but text strings also provide some operations on grapheme units.

A Char literal typically consists of ? and a single code point, like ?a and ?ㅋ. ? can be followed by two or more letters or a bare string literal; in this case it indicates the name of characters. For example, ?space maps to ? (? followed by a space) and in fact prefered over the latter; likewise ?"HANGUL SYLLABLE BAB" maps to ?밥, which is nevertheless allowed but somewhat harder to understand for non-Koreans. The names of characters are taken from Name and Name_Alias properties of Unicode Character Database (UCD), and also from Unicode_1_Name properties of UCD for characters U+0000..001F and U+007F..009F. (For paranoids: These control characters do not have a formal name since Unicode 2.0.) Names are case-insensitive, and whitespaces, underscores and hyphens between two letters are mostly ignored, as per Loose Matching rule 2 (LM2) of UAX #44. (For paranoids: A single exception to this rule is U+1180 HANGUL JUNGSEONG O-E, which will be confused with U+116C HANGUL JUNGSEONG OE.) Every character is also a name of itself, so ?"a" equals to ?a.

Char literal can be also composed of ? and a single escape sequence, starting with \. (The escape sequence can be also in the string, but technically this is not a part of character literal.) An escape sequence can be one of these:

These escape sequences can also be used in the string literal as well, with additional escape sequences.

?a println()                  -- a
?: println()                  -- :
?sp println()                 --   (U+0020)
?AmperSand println()          -- &
?\\ println()                 -- \
?\42 println()                -- * (U+002A)
?\uD7A3 println()             -- 힣 (U+D7A3)
?\U12345678 println()         -- error (code point above U+10FFFF)
?'number sign' println()      -- #
?"  \78\85\77\66\69\82\83\73\71\78   " println()
                              -- # (the name reads "  NUMBERSIGN   ")

Char class has an access to parts of Unicode Character Database whenever possible; for example, ?# name will result in "NUMBER SIGN" and ?# category will result in :Po (Punctuation, others). Chars are ordered by their code points (although it is mostly useless), so String from(?a..?z) will give all the lowercase Latin letters.

Code points for surrogate pair, U+D800..DFFF, can be represented by Char but cannot appear in the text string. TODO really? there are lots of systems that (accidently) accept the lone surrogate pair character, like Twitter, and they have to be preserved for the interoperability.

More on Numbers

The following is the full hierarchy of numbers in naru:

Number
+- Complex
   |- Real
   |  |- Rational
   |  |  |- Int
   |  |  |  |- BigInt
   |  |  |  |- SmallInt
   |  |  |  +- NativeInt
   |  |  |     |- NatNN
   |  |  |     +- IntNN
   |  |  +- Ratio
   |  |- FloatNN
   |  |- Decimal
   |  +- Interval{+T}
   |- CFloatNN
   +- CDecimal

In short, there are four abstract classes (domains) in naru: integers (Int), rational numbers (Rational), real numbers (Real) and complex numbers (Complex). Others are concrete classes that implement numeric domains: for example, BigInt and SmallInt are actual concrete classes for integers, and the actual class of an integer literal is one of them. NatNN, IntNN, FloatNN and CFloatNN reflect the natively supported number types and suffixed by their bit length (e.g. Nat64, Float80); for native complex types the combined bit length of real and imaginary parts is used (e.g. CFloat128 consists of two Float64).

Besides from corresponding domains, every numbers are either exact or inexact (as like Scheme). The exactness of given number can be retrieved by exact attribute. This distinction is most clearly visible in real numbers, where Decimal always keeps an exactness flag. (Integers are always exact by definition; FloatNN and CFloatNN are inexact due to their use of floating point representation.)

Numeric Literals

I will present the exact syntax of numeric literals hereby, albeit I believe almost everyone already has a good understanding about them.

Integer literals

0, 1, 2, 42, 1000000, and so on. They are values of SmallInt or BigInt depending on its size, but for all practical purposes you can regard them as simply Int. There are no negative integer literals: constructing a negative number (and thus integer) is a job of parser, not tokenizer. (This also applies to all other numeric literals.)

Integer literals can be prefixed by the radix indicators:

  • 0b for binary, where valid digits are 01.
  • 0o for octal, where valid digits are 01234567.
  • 0x for hexadecimal, where vaild digits are 0123456789ABCDEF (case-insensitively).

Additionally, generic literals 2"...", 8"..." and 16"..." can be used instead of these indicators. This syntax allows an arbitrary radix between 2 and 36 (where a digit Z corresponds to 35), and either whitespaces or underscores are ignored for readability.

TODO enclosed digits in generic integer literal

Decimal literals

3.14, 314e-2 and so on. They are values of Decimal. Basically any number containing a decimal point (.) or exponent part (e123, e+123, E123, E+123 etc.) is a decimal literal. A decimal point should be between two digits: .345 or 345. don’t constitute a valid decimal literal (while 345e0 is valid). When the exponent part is present, a number before it is multiplied by a power of 10 raised to the given exponent.

There is a special digit for decimal literals: ?. It is mostly like a digit 0, but in addition specifies that the decimal literal containing it is inexact and the digit is insignificant. (Therefore there should be no significant digits after ?.) Therefore 314???? is equivalent to 3.14????e4 or 3.14?e4, but not 3.14e4 which is exact. Any decimal literal should include at least one significant digit (even though it is 0) in order to distinguish it from Char.

Floating point literals

3.14f, 314e-2f and so on. In the other words, they are decimal literals followed by f or F, and values of the biggest Float* type that is suitable in the context. For example, a: Float32 | Float64 = 3.14f will use Float64—as two types are possible and Float64 is bigger—but a = 3.14f will use Float80 in IA-32 and x86-64. This is because retrieving (say) Float32 number by converting the decimal string directly is not same to retrieving Float64 and converting it to Float32. (For paranoids: This can result a precision loss when rounding toward nearest even or infinity is in use.)

To support a precise floating point number, naru allows 0x radix indicator in the floating point literals. In this format binary exponent part (p123, corresponds to 2123) is used instead of decimal one. Therefore 3.14f is approximately equivalent to 0x1.91eb851eb851fp+1f in Float64, and so on.

Floating point literals support ? special digits, but they behave as like 0 digits and the resulting number is always inexact. This is an another reason for avoiding floating point numbers.

Imaginary literals

3.14j, 3.14fj and so on. In the other words, they are decimal or floating point literals followed by j or J (which is often used in electrical engineering to avoid a confusion with the current, i). They are values of CDecimal or CFloatNN accoding to their base literal.

Underscores (_) may appear within a numeric literal for readability; they are valid as long as there are no consecutive underscores, and they only appear between two digits, between a digit and a exponent separator (e or p) or between a digit and a decimal point (.). For example, 0x1_2_3 and 12_3_._4_5?_??_e_15 are valid, but _34 (which is an identifier), 34_, 0_x123 or 13e_+45 are invalid. Generic integer literals do not have this restriction as long as they are properly delimited by quotes.

There are no separate literals for Ratio and Complex with a real part, but the constant expression for them can be easily constructed with division and addition: 3/4 and 3+4j, for example.

Native Integer Types

As mentioned earlier, the principal integer type in naru, Int, has arbitrary precision. However there are several size-restricted (native) types on Int which works just like integers in other many languages. These are available as a part of naru ext module, since it is an integral part of naru native interface. They are also useful for tasks that require the fixed-size integers, such as low-level cryptological algorithms.

Arithmetics on native integer types can overflow without a notice:

import naru ext *
a = Int32 from(66000)
(a * a) println()         --> 61032704 (= 66000^2 mod 2^32)

This is intentional. Those who use native types should understand the exact characterstics of them, so they should deal overflows by themselves. The behavior of native types on the corner case is thus undefined—it is possible that, for example, Int32 from(2) >> 32 returns 2 (as it does in IA-32 architecture, where a shift amount is ANDed with 31). It is even possible that native types do not use 2’s complement, unlike Int. (For paranoids: Int implicitly assumes 2’s complement representation, but in fact it is more like a 2-adic number restricted to the actual integers.)

One can request overflows to be explicitly notified (by throwing exceptions). This will also catch corner cases like Int32 from(2) >> 32:

do
    (a * a with :checked) println()
but Exception as e
    #"#a * #a overflowed." println()
end

Of course the inline but expression comes handy in this case:

(a * a with :checked but OverflowError -> 0) println()

Conversion between different native integer types is explicit. This prohibits all sort of bugs due to signed-unsigned conversion, for example.

a = Int32 from(1234567890)
a2 = Int64 from(a)                 -- 1234567890
a3 = Int16 from(a)                 -- 722 (= 1234567890 mod 2^16)
a4 = Int8 from(a)                  -- -46 (= 1234567890 mod 2^8 - 2^8)
a5 = Int8 from(a) with :checked    -- error
a6 = Int16 from(a4)                -- -46
a7 = Int8 from(a6) with :checked   -- -46 (as -46 fits in Int8)

b = Int32 from(-987654321)           
b2: Nat32 = b unsigned                -- 3307312975
b3: Int32 = b2 signed                 -- -987654321

More Numeric Operations

I have already introduced basic arithmetic operators on numbers, +, -, *, /, //, %, and **. I will introduce remaining ones here.

Combined quotient-remainder operation. This operation is denoted /%, and by the definition a /% b == (a // b, a % b). However it is more efficient than two // and % operations, and can be really handy in some cases:

-- mixed radix handling! yay!
nminutes, seconds = nseconds /% 60
nhours, minutes = nminutes /% 60
ndays, hours = nhours /% 24
years, ndaysinyear = ndays /% 365
weeks, days = ndaysinyear /% 7

Bitwise operators. These are similar to C-like languages:

Of course they are only applicable for Ints and subtypes. 2’s complement representation is assumed for all Ints, so some identities like ~x == -x-1 are always true for Int. Native integer types do not have to behave like this though.

Special bitwise operations. While they are not as necessary as bitwise operators above, integer types provide more specialized bitwise operations:

These operations are only available for fixed-size integer type due to their semantics:

TODO

Towards Accurate Calculation

TODO should decimals track the significant digits? then how about interval arithmetic which will collide with significance arithmetic? or exact arithmetic based on continued fraction?

Mathematical Functions

naru provides two modules for mathematical functions, named naru math (for real numbers) and naru math complex (for complex numbers). The former are separated from the latter as complex-valued functions are not like real-valued functions: the logarithm of i, for instance, is not only π/2i but also -3π/2i, 5π/2i and so on. Whoever using complex-valued functions should be aware of this multi-valued nature.

The difference between these modules and Number methods is that the (relative) simplicity to implement them: gcd, for example, needs a bulk of code to speed it up (Hint: normal Euclidean algorithm is SLOW) but bitlength is readily available in many modern architectures.

Constants. pi (3.141592…) and e (2.718281…) are constants for the ratio of a circumference of a circle to its radius and the base of natural logarithm respectively. TODO what type of those constants should be? the automatic rounding hurts the accuracy (should be less than 0.5ulp). it is possible that those are separate singleton subtypes of Real (say Pi and E) and coerced whenever needed (Pi * Float64 -> Float64, Pi * Float32 -> Float32 etc.), but it is a bit too much.

Arithmetic. Some arithmetic functions present in methods or easily implementable using them are replicated here as well, mostly for convenience and subtle differences:

Trigonometric and Hyperbolic Functions. There are 24 possible combinations:

For trigonometric functions, the second argument specifies how x is scaled: :radians (default) and :degrees are available. For inverse trigonometric functions it specifies how its result is scaled.

naru also provides a much-used atan2(y, x) as a sign-correcting version of atan(y / x), which result ranges from -π (exclusive except for y == -0.0) to +π (inclusive).

Exponential and Logarithmic Functions. Since these functions are crucial for many numerical calculations, many variations are available:

Non-elementary Functions. TODO description

Other Mathematical Functions. TODO description

Floating Point Specials. TODO description

Complex-valued Functions. abs(z) (not actually in naru math complex module) returns sqrt(z imag ** 2 + z real ** 2) as expected. phase(z) returns a principal argument of z, which ranges from -π (exclusive except for arg(-1.0-0.0j)) and +π (inclusive). Other functions follow this fixed range of phase(z) to determine its principal value.

Some arithmetic functions (i.e. sign, pow and sqrt), all trigonometric and hyperbolic functions, all exponential and logarithmic functions except for atan2, and all non-elementary functions are available in naru math complex module, but with some changes:

More on Collections

The following is the full hierarchy of collections in naru:

Enumerable{+T}
|- Iterator{+T}
|- Option{+T}                  (also known as T?)
|- Range{+T}
|- Sorted{T}
|  +- ...
+- Immutable{+T}
   |- PureSList{+T}
   |- PureList{+T}
   |- PureVector{+T}
   |  |- Symbol (T=Char)
   |  |- PureBytes (T=Int)
   |  +- PureSortedVector{+T} <- Sorted{T}
   |- PureMultiset{+T}
   |  |- PureSet{+T}
   |  |  +- PureSortedSet{+T} <- Sorted{T}
   |  +- PureSortedMultiset{+T} <- Sorted{T}
   |- PureMultimap{+Key, +Value} (T=(Key,Value))
   |  |- PureMap{+Key, +Value}
   |  |  +- PureSortedMap{+Key, +Value} <- Sorted{(Key,Value)}
   |  +- PureSortedMultimap{+Key, +Value} <- Sorted{(Key,Value)}
   +- Mutable{T}
      |- SList{T} <- PureSList{T}
      |- List{T} <- PureList{T}
      |- Vector{T} <- PureVector{T}
      |  |- String <- Symbol (T=Char)
      |  |- Bytes <- PureBytes (T=Int)
      |  +- SortedVector{T} <- PureSortedVector{T}
      |- Multiset{T} <- PureMultiset{T}
      |  |- Set{T} <- PureSet{T}
      |  |  +- SortedSet{T} <- PureSortedSet{T}
      |  +- SortedMultiset{T} <- PureSortedMultiset{T}
      +- Multimap{+Key, Value} <- PureMultimap{Key, Value} (T=(Key,Value))
         |- Map{+Key, Value} <- PureMap{Key, Value}
         |  +- SortedMap{+Key, Value} <- PureSortedMap{Key, Value}
         +- SortedMultimap{+Key, Value} <- PureSortedMultimap{Key, Value}

Collections are basically classified by key, value types and restrictions and performance guarantees on them. There are two broad classification methods for collections: mutable vs. immutable and sorted vs. unsorted.

Mutable vs. Immutable

Collections in naru distinguish the immutable ones and mutable ones. Immutable collections are sometimes called “functional data structure”, since it is apt for pure functional languages. Since they are immutable there are some opportunity to optimize: for example, collections with many parts shared will benefit since they will have much smaller memory footprint.

Every built-in collection syntax you have seen before is mutable; you can just prepend : to get the immutable version. (Think of := which also conveys the immutability.)

vec1 = [3, 4, 5];        vec2 = :[3, 4, 5]           -- Vector{Int}, PureVector{Int}
set1 = {3, 4, 5};        set2 = :{3, 4, 5}           -- Set{Int}, PureSet{Int}
map1 = {3 => 4, 5 => 6}; map2 = :{3 => 4, 5 => 6}    -- Map{Int,Int}, PureMap{Int,Int}
str1 = "blahblah";       str2 = :"blahblah"          -- String, Symbol

In the last line you can see that string was in fact mutable. The immutable version is called symbol; if the symbol contains only letters you can omit quotes like :blahblah. When is the symbol useful? The most important case is a key for the map, which has to be immutable. For example:

numbers = {:one => 1, :two => 2, :"fourty-two" => 42}
numbers[:"fourty-two"] println()      --> 42
numbers[:"one hundred"] = 100

“Hey, you have used a bare string as a key before. You mean that was incorrect?!” Of course no. Since the string has an immutable counterpart, a map with string keys will silently convert to/from symbols. For example:

numbers = {"one" => 1, "two" => 2}
key = "one hundred"
numbers[key] = 100
key[0..2] = "two"
key println()                         --> two hundred
numbers println()                     --> {"one" => 1, "two" => 2, "one hundred" => 100}

In this case only the copy of key (which is a symbol) is stored in the map, and every access to the key will return a string copy of it. (For paranoids: This may incur a big cost when naively implemented, but a copy-on-write string/symbol implementation will eliminate most costs. And the compiler may recognize the string which does not change at all and treat them as like a symbol, too.)

For the pure maps, you know what you are doing so you have to use immutable types only. But since it is quite cumbersome to write all colons you just have to write only one outermost colon in this case. Therefore the following three lines are equivalent, and both has the type PureMap{Symbol,List{Int}}.

:{"one" => [2, 3], "four" => [5], "six" => [7, 8, 9, 10]}
:{:one => [2, 3], :four => [5], :six => [7, 8, 9, 10]}
:{one: [2, 3], four: [5], six: [7, 8, 9, 10]}     -- a shortcut for symbol keys

Anyway, in most cases there are a pair of mutable and immutable collections, and you can use freeze() and thaw() methods to get an immutable/mutable version of the collection. For standard collections those conversion methods are designed to be efficient, especially when the conversion is temporary (i.e. mutable version is actually not altered). Mutable and Immutable class attributes can be used to get a type of counterparts.

Enumerable interface

Every collection provides the common interface, abstracted by Enumerable, and it is used to implement the loop. The actual iteration is implemented by Iterator interface, which is also a subclass of Enumerable; it basically returns elements in the collection one-by-one until the collection is ran out (if the collection is finite). Enumerable collection should provide an iterator() method which returns a fresh value of Iterator.

Due to this abstraction, the following loop…

for el <- collection
    -- do something with el
end

…is equivalent to the following desugared form (Option{T} type will be introduced shortly):

with _iterator <- collection iterator()          -- internal variable
    while true
        _current: Option{T} := _iterator next()     -- internal variable
        if not _current holds then break end
        el = _current value
        -- do something with el
    end
end

The iterator is functional. This means that you don’t have a headache when iterating the collection and simultaneously modifying it. For example:

arr = [1, 2, 3]
for x <- arr
    x println()
    arr append(x)
end
arr println()

This code is well-defined in naru and prints 1, 2, 3 and [1, 2, 3, 1, 2, 3] respectively. To achieve this behavior, the collection is implicitly frozen when the iteration starts as the iteration on immutable collections is trivial. Since the conversion between immutable and mutable collections generally incurs a small cost (and the compiler often converts a mutable collection into the immutable version when possible) this does not penalizes the normal loops.

Sorted vs. Unsorted

There are a series of sorted collections that mandate the enumeration order to be sorted. Therefore an element of sorted collections should be ordered (i.e. a value of Ordered abstract class). Unlike the mutability, the sortedness is not strictly checked by the compiler. This means the following code is valid:

a: Vector{Int} = [1, 5, 3]
b: SortedVector{Int} = SortedVector([2, 4])
a2: SortedVector{Int} = a sorted()             -- returns a sorted collection
c: Vector{Int} = b ~ a2                        --> [2, 4, 1, 3, 5]

While there is a good merging algorithm around (and in fact available as SortedVector merge), in general just concatenating two sorted vector does not yield a sorted vector automatically. Of course concatenating an empty vector does yield a value of SortedVector{Int} class, but its type is still Vector{Int}.

Unsorted collections are further divided by two classes, ordered and unordered, depending on the stability of the iteration order. For example, it is possible that two unordered collections with same elements may give their elements in the different order. Of course once the order is known, however, the order should be preserved until the collection is altered. (For paranoids: This means every collection which optimizes itself during the access should optimize itself before any iteration; an example of this restriction would be a union-find data structure, used in PartitionedVector.) Vector is an example of unsorted but ordered collection; Set does not have to be sorted nor ordered.

The main reason of having the sortedness is that you normally don’t care whether the collection is sorted or not, but knowing that one is sorted often speeds things up and you will need it for certain circumstances anyway. Also there is some freedom that has to be sacrificed for the sortedness: for example, Map can use a hash table but SortedMap cannot.

The following is a full list of standard collections in naru. Only the name of mutable version is shown, as the name of immutable version almost always follows the same pattern (Pure plus the name of mutable one).

Name Sorted? Main usage Underlying data structure
SList No Sequential access without almost no random access; supports O(1) slicing without copy Singly-linked list
List No Bidirectional access without almost no random access; supports O(1) slicing without copy Doubly-linked list
Vector No General random access; insertion is costly except for appending Flexible array or deque
String No General string; rich string operations with Unicode support Flexible array (or rope?)
Bytes No Byte sequence; some ASCII-oriented string operations Flexible array or deque
SortedVector Yes General random access; O(log n) search for elements; random update and insertion is costly except for appending Flexible array or deque
Set No almost O(1) search for elements; removes duplicates automatically; supports fast set operations Hash table with open addressing; possibly perfect hashing for PureSet (if compiler supports)
SortedSet Yes O(log n) search for elements, lower and upper bounds; removes duplicates automatically; supports fast set operations Self-balancing binary tree
Multiset No almost O(1) search for elements; keeps duplicate elements; supports fast set operations Hash table with separate chaining
SortedMultiset Yes O(log n) search for elements, lower and upper bounds; keeps duplicate elements; supports fast fast set operations Self-balancing binary tree
Map No almost O(1) search for keys; removes duplicate keys automatically Hash table with open addressing; possibly perfect hashing for PureMap (if compiler supports)
SortedMap Yes O(log n) search for keys, lower and upper bounds; removes duplicate keys automatically Self-balancing binary tree
Multimap No almost O(1) search for keys; keeps duplicate key-value pairs Hash table with separate chaining
SortedMultimap Yes O(log n) search for keys, lower and upper bounds; keeps duplicate key-value pairs Self-balancing binary tree

There are several advanced collections offered by naru collection module:

Name Sorted? Main usage Underlying data structure
OrderedSet No almost O(1) search for elements; removes duplicates automatically; preserves the insertion order Hash table with open addressing, overlaid with doubly-linked list
OrderedMultiset No almost O(1) search for elements; keeps duplicate elements; preserves the insertion order Hash table with separate chaining, overlaid with doubly-linked list
OrderedMap No almost O(1) search for keys; removes duplicate keys automatically; preserves the insertion order Hash table with open addressing, overlaid with doubly-linked list
OrderedMultimap No almost O(1) search for keys; keeps duplicate key-value pairs; preserves the insertion order Hash table with separate chaining, overlaid with doubly-linked list
Bimap No almost O(1) search for both keys and values Two hash tables
PriorityQueue No O(1) access and deletion for minimal or maximal element; O(log n) insertion; fast merge operation Fibonacci heap
PartitionedVector No O(1) access for each partitions; almost O(1) partition merges Disjoint-set forest
PartitionedMap No almost O(1) access for each partitions; almost O(1) partition merges Disjoint-set forest plus a hash table

TODO

Collection Syntax

As mentioned before, there are eight collection syntaxes available:

The last case is quite different from other syntaxes, as they are in fact single literals. The detailed behavior of them will be presented later; also note that "..." syntax can be extended by prepending a user-defined prefix.

There are two flavors of syntaxes other than strings and symbols: enumeration and comprehension.

Enumeration syntax. This is a collection with every elements enumerated, like [1, 2, 3], {1, 2, 3} and {1 => 2, 3 => 4}. The trailing comma is ignored (i.e. [1, 2,] == [1, 2]). An empty collection is represented as [], {/} and {} respectively: {/} is for a set and {} is for a map. (Note that {/ }, { /} or { / } are all invalid, as {/} is a single token. { } is equivalent to {} however.) When the opening token prepended by a single token : the syntax denotes an immutable collection instead: :[1, 2, 3], :{1, 2, 3} and :{1 => 2, 3 => 4}. (Therefore : {/} is valid, although looks quite confusing.)

A flattening prefix \ can be used within the enumeration, just like function arguments:

a = [3, 4, 5]
b = [1, 2, \a, 6, 7]      -- b is now [1, 2, 3, 4, 5, 6, 7]

a = {3, 4, 5}
b = [1, 2, \a, 6, 7]      -- b contains seven elements from 1 to 7,
                          -- but 3, 4, 5 will be arbitrarily ordered.

a = {"foo" => "bar"}
b = {"a" => "x", \a}      -- b is now {"a" => "x", "foo" => "bar"}

a = {3 => 4}
b = [(1,2), \a, (5,6)]    -- b is now [(1,2), (3,4), (5,6)]

As shown in examples, flattening a unsorted collection into an ordered collection highly depends on the iteration order which is arbitrary by definition. It also shows that two collections should be of the compatible type during the flattening; the last example is valid, as Map{Int,Int} is a subtype of Enumerable{(Int,Int)}.

For collections other than vectors, sets and maps, the enumeration syntax is simply a collection class called with Enumerable interface:

a = SortedMultiset([5, 4, 3, 2, 1, 2, 3, 4])
a println()               -- SortedMultiset([1, 2, 2, 3, 3, 4, 4, 5])

Note that the iteration of an unordered collection is arbitrary, so you should be careful when initializing an unordered collection which requires an ordered collection nevertheless.

Comprehension syntax. This is a collection constructed from other collections:

a = [1, 2, 3]
b = [x**2 for x <- a]             -- b is [1, 4, 9]
c = [5-x for x <- b if x<=5]      -- c is [4, 1]
d = :{x + y for x <- a, y <- a}   -- d is :{2, 3, 4, 5, 6}
e = {x => x**3 for x <- a}        -- e is {1 => 1, 2 => 8, 3 => 27}

For mutable collections, a vector comprehension [f(x1,...,xn) for x1 <- L1 if g1(x1) ... for xn <- Ln if gn(x1,...,xn)] is equivalent to this code:

L = []
for x1 <- L1
    if g1(x1)
        ...
            for xn <- Ln
                if gn(x1,...,xn)
                    L append(f(x1,...,xn))
                end
            end
        ...
    end
end

A set comprehension and map comprehension is similarly defined, and for x1 <- L1, x2 <- L2 is equivalent to for x1 <- L1 for x2 <- L2. As a consequence, variables from x1 to xn is only visible within the comprehension, and xk is only visible after for xk <- ... appears (except for f(x1,...,xn), of course).

For other collections without any built-in syntax, one can use a simple coroutine syntax which is incidentally similar to the comprehension:

a = {1 => 2, 3 => 4}
b = Bimap((y, x) for x, y <- a)
-- ...which is equivalent to Bimap( ((y, x) for x, y <- a) ).

TODO flattening comprehension: [\(0..k) for k <- 0..3] == [0, 0, 1, 0, 1, 2, 0, 1, 2, 3] for example.

Options

While naru has () value, it is mostly used for a return value from functions with no return statement. (This is equivalent to unit in many functional languages, because there is only one possible value for this type.) You often want to store something only if some condition meets; the variable will store nothing if not. But you cannot use () for this purpose in general, as () is fundamentally different from other values and often explodes in front of your face.

The better way is to explicitly declare that you are aware of such cases. Thus naru has Option class (which can be written as T? in the type specification):

lexicon = {
    "hello" => "Hallo",
    "dog" => "Hund",
    "wing" => "Flügel",
}
lexicon get('dog') println()             --> Just("Hund")
lexicon get('cat') println()             --> None

Map get() method returns an Option{T} value, which can be a Just{T} value with a desired value of type T or a None value without any associated value. In the reverse direction you can make a new Option value by just(value) or none, without no type parameters (thanks to the type inference). One can check if the value is None or not by using hold attribute:

word2 = lexicon get(word)
if word2 hold
    #"#word -> #(word2 value)" println()
else
    "Not found." println()
end

But it is somewhat verbose. Now the fundamental question: what on earth is Option discussed in the collection section? Because it is a trivial collection with zero or one element. In fact you can loop with an Option value:

for w <- word2
    #"#word -> #w" println()
    break
else
    "Not found." println()
end

Here else part in for block gets executed when the loop terminates normally, not abruptly by break or return statement. But this also look verbose, so naru has an another shortcut:

if w <- word2
    #"#word -> #w" println()
else
    "Not found." println()
end

It only works for Option class and its subclasses, by the way. For other collections you can just use for loop.

(For paranoids: none is of None{Bottom} class, where Bottom is a type with no values and a subtype of all other types. Since Option is covariant—it is immutable!—, none is a value of None{T} for any type T. Great.)

Strings

TODO also syntax

Ranges

TODO also syntax

Operations on Collections

TODO

Another Kind of Types

Well, this title is misleading. While we have used the words “type” and “class” with no doubt, it’s time to confess: they are too confusing as is. In naru we have three good candidates that fit in those words:

  1. A thing that class attribute returns.
  2. A kind of values that the function (and class constructor) may receive.
  3. A kind of values that the function (and class constructor) actually receives.

2 and 3 are different things because we may put 42 to the function with type (Rational)->(). Guy L. Steele has written about this problem at the length, and we will go with the simplest solution (with some caveats): those that can be easily encountered in runtime are named class, and those that cannot are named type. Therefore our terminology is as follows: metaclass for 1, type for 2 and class for 3.

In the other words, we have talked about all those classy things, but not types. So let’s talk about types. The first thing: naru is not a dynamically typed language, and is not a statically typed language either. naru lies somewhere between them. When you write the following:

a = 42

It actually means the following:

a: * = 42

…where * is a special type meaning it is dynamic. It may result in runtime error at any time. However if you write the following:

a: String = 42

Then the compiler knows that 42 (of Int class) cannot be of type String (certainly!) and lets us know the error. You may mix and match these types as well:

a: * = 42
b: Int = 54
(a / b) println()

Since a is dynamic, a / b is also dynamic (it may result in Real or Rational depending on a’s class). TODO we are not certain about a mix of dynamic and static types; we may adopt some other works here

We will see other categories of types shortly, but look at the important category first: function types. These are denoted as (T1,T2,...)->U where Tn are argument types and U is a return type.

zero: ()->Int := fun () -> 0

-- named arguments are a part of function types.
double: (Int)->Int := fun (x:Int) -> x * 2
triple: (x:Int)->Int := fun (x:Int) -> x * 3
double(42)         --> 84
triple(42)         --> 126
triple(x:42)       --> 126

-- argument names themselves do not affect the type system.
pow: (Int,Int)->Int := \
    fun (x:Int,y:Int) -> base == 0 and exponent == 0 then 1 else x ** y
pow1: (x:Int, y:Int)->Int := pow
pow2: (y:Int, x:Int)->Int := pow1
pow1(x:3, y:4)     --> 81
pow2(x:3, y:4)     --> 64

As noted, argument names are merely an annotation that is only used when you call the function. (In PL jargon: they do not affect the subtyping, i.e. (x:A,y:B)->C <: (y:A,x:B)->C and (y:A,x:B)->C <: (x:A,y:B)->C.) However you will be hardly affected by this subtle rule, since the normal function declaration like this:

add(x:Int, y:Int) := fun
    ...
end

…will implicitly set the correct annotation. In fact this is equivalent to the following code:

add: (x:Int, y:Int)->_ := fun (x:Int, y:Int):_
    ...
end

Also note that the function arguments can be followed by the result type. Except that -> is used in place of :, this is also applicable for the uniform declaration, like add(x:Int, y:Int)->Int := .... (The placeholder _ will be explained later in this section.)

Type Specification

The type specification in naru is a mini language for its own:

Top-level types

Int, String and so on. These types normally reflect classes available in the current scope, but you can also use the types from other modules:

import somemodule
some: somemodule SomeClass = somemodule foo()

import anymodule *             -- if this imports AnyClass...
any: AnyClass = bar()          -- then this is also valid.

Every top-level types in the type specification should be declared by the binding (:=). TODO is this restriction necessary? can’t we have the runtime type checker?

Parametric types

In the other words, types with static parameters. These take the same form with the definition: Vector{Int}, for example.

Other types can be prefixed by the names of static parameters to denote the parametric types: for example, {T,U} ((T)->U,T)->U. These static parameters are local to the following type specification: so ({T} (T)->Int, {T} (T)->String)->() is a correct type and can take values of (Char)->Int and (String)->String as arguments.

For parametric classes like Vector you can shorten {T} Vector{T} (i.e. all values of that type regardless of static parameters) to simply Vector.

Class attributes

You can access class attributes that are defined by binding (:=) from the type specification: the prime example would be Mutable and Immutable type attributes in Collection class, like Vector{Int} Immutable (which will result in PureVector{Int}; see the collection list below).

Tuple types

(), (Int, Real), (Int, Real, String) and so on.

One or more types within a tuple type can be prefixed by \, which requires corresponding types are a tuple type with fixed size. (For example, if T is (Int,Real) then (\T,String,\T) is equivalent to (Int,Real,String,Int,Real).) Tuple types can also have an ellipsis (...) at the end of the list, meaning the type accepts given number of values and zero or more other types. Therefore it is in fact a syntactic sugar to \_.

TODO exact encoding is not decided

Function types

()->(), (Int,Int)->Int, (fmt:String,args:Value...)->() and so on. The generic syntax is explained already; there can be several kinds of argument specifications inside it:

  • A normal argument can be simply a type (e.g. String).
  • A named argument can be prefixed by the name (e.g. fmt:String).
  • Varadic number of arguments is denoted by trailing ... (e.g. args:Value...), where its actual type is a Vector of specified type (e.g. Vector{Value} in this example). This argument should be at the end of argument list if present, except for keyword-only arguments.
  • An optional argument is denoted by trailing ? (e.g. String?), where its actual type is an Option of specified type (e.g. Option{String} in this example).

As stated earlier the name of arguments if given does not affect the type system. As an exception, you cannot ignore the name of keyword-only arguments: you have to keep them as is if they are non-optional.

Union and intersection types

Int|Real, {T} (Reader{T} & Writer{T}) and so on. These represent the disjunction and conjunction of given types respectively, and syntactically bound later than static parameters and attributes.

Dynamic type

Denoted as *, it enables the full dynamic type checking. It can appear in any position in the type specification, e.g. (*,*,*)->* (which would be the default type of fun (x,y,z) -> x). (For paranoids: It may be possible that fun (x,y,z) -> x would have a type of {T} (T,*,*)->T, but we don’t do so in order to simplify the implementation.)

Dynamic type is a supertype of all static types, and thus the conversion from static type to dynamic type is automatic. The reversal needs the type casting such as x as SomeType. We will discuss about how to mix static and dynamic types later.

Inferred type

Denoted as _, it is not a really type but specifies that the compiler has to fill it with the appropriate type. For example, x: _ := 42 will be almost equivalent to x: Int := 42. (In reality, it can be filled with other implementation-dependent subtypes of Int, e.g. SmallInt.) Like the dynamic type it can appear in any position in the type specification, e.g. Vector{_} or _{Int}, and placed before ... it will infer the types of the arbitrary number of function arguments, e.g. (Int,_...)->().

It is the default type for the return type in the function, and in fact the only possible return type for the inline function syntax—for example, fun (x,y) -> x+y is equivalent to fun (x,y)->_; return x+y; end (which type is (x:*,y:*)->*). It also arises from the use of placeholder in the tuple unpacking, and also from the assignment or binding without a type when use static is enabled.

self type

Denoted as self, it denotes the type of enclosing class. It is invalid outside the class or in the class method.

It is quite limited compared to the normal expression syntax, which is intentional and allows an efficient compile-time implementation.

(For paranoids: Technically speaking, naru’s type system is (for now) a rank-1, predicative Hindley-Milner type system extended with nominal subtyping. The type inference on this type system should be decidable, though it hasn’t been proved yet. We plan to extend the type system to the higher-order and impredicative subset of System F, which restrictions allow the decidable type inference with some annotations.)

TODO describe syntactic sugars like T? for type expression

Mixing Static and Dynamic Types

While the static typing in naru is optional, naru does force one to use the static typing in some cases. The major cause is the multiple dispatch, since we should know the arguments’ types in order to determine what function is to be called:

#+#(a: Int, b: Real) := fun -> Real from(a) + b
#+#(a: Real, b: Int) := fun -> a + Real from(b)

If some expression is typed statically, any access to attributes and methods of the expression is checked in the compile time. So if we wrote the first function body as fun -> a addToReal(b), it will be caught earlier given that addtoReal method does not exist in Int.

Static to Dynamic. While this behavior detects many bugs due to the typo, one may find it inconvenient. There are several ways to convert the static type into the dynamic type (i.e. force the compiler to forget the type information): the easy way is to assign the statically-typed value to the variable, since the variable is typed dynamically by default.

#+#(a: Int, b: Real) := fun
    a0 = a                  -- a0 is typed as *
    return a addToReal(b)   -- this will raise the runtime error
end

The variation of this way is to use an explicit type casting via as expression. naru will treat given expression as given type, and if needed will add a check for ensuring the assertion.

#+#(a: Int, b: Real) := fun -> (a as *) addToReal(b)

The last way is to use a function which arguments are dynamically typed. This allows the dynamically-typed function to accept the statically-typed values without any modification, with an appropriate runtime check.

addHelper(a, b) := fun -> a addToReal(b)
#+#(a: Int, b: Real) := fun -> addHelper(a, b)

Personally, I am against the first and second way because it will prevent the early detection of bugs. The very reason for naru to allow mixing static and dynamic types is to help the gradual transition: one may keep some functions static and some functions dynamic, and will only receive the runtime errors from the latter only. If you want the function to be static then it is better keeping it throughout the function.

Dynamic to Static. In the reverse direction, one may convert the dynamic type into the static type for better safety. The first way is to use the explicit type casting: if the cast fails the CastError exception will be thrown.

double(x) := fun
    do
        return (x as Int) * 2
    but CastError
        do
            return (x as String) ~~ 2
        but CastError
            raise ValueError("x is not an integer nor a string")
        end
    end
end

Note that one can use the Value isa method instead of the nested do~but blocks:

double(x) := fun
    if x isa(Int) then
        return (x as Int) * 2
    elseif x isa(String) then
        return (x as String) ~~ 2
    else
        raise ValueError("x is not an integer nor a string")
    end
end

Of course one still have to cast x in order to notify the compiler; the compiler does not recognize the Value isa method at all! Since this is very error-prone (what if one misses x as SomeType in one case?), naru provides a shortcut:

double(x) := fun
    given x
    case Int as x then          -- overrides the prior binding for x
        return x * 2
    case String as y then       -- in this case x is still dynamically typed
        return y ~~ 2
    case * then
        -- this case will catch all other types, since any type can be
        -- converted to the dynamic type.
        raise ValueError("x is not an integer nor a string")
    else
        -- this case is impossible, and in fact, can be omitted.
    end
end

This given-case construct is generally used for pattern matching, but can also be used for general type matching as shown here. Detailed rules for pattern and type matching are given in the following sections.

The other way to convert the dynamic type to the static type is using the statically-typed function, possibly with the multiple dispatch. This is possible because calling the statically-typed function with a mismatching dynamic type is not a compile-time error but a runtime error. In fact, this is how as operator is internally implemented: x as SomeType is equivalent to (fun (_arg:SomeType) -> _arg)(x).

Forcing the static typing. We have already seen that the type of variables default to the dynamic type(*). One may use use static and use dynamic in order to override this behavior:

-- the type checker will catch "p length" to be invalid.
do
    use static
    p = 4
    q = "Hello"
    "This line should not be printed" println()
    (p length + q length) println()
end

-- before running the program the type checker does not try to
-- check "p length" to be invalid or not.
do
    use dynamic
    p = 4
    q = "Hello"
    "This line will be printed though" println()
    (p length + q length) println()
end

Note that if we wrote the second-to-last line as (4 length + "Hello" length) println() instead, then it will cause the type error in compile time, despite use dynamic is enabled. This is because 4 itself always has a type of Int (or its subtype); use static and use dynamic will make a difference only when the assignment or binding is used.

use static also does not affect the argument types, which always default to the dynamic type. Therefore one should at least annotate the argument types when using static typing as a default. (For paranoids: While it is not impossible to infer the argument types from the function body, such types are very limited. For example, imagine the function succ(x:_)->_ := fun -> x + 1; given #+# is open to extension, what argument types should be inferred here? The solution is provided in the next section.)

Type Inference

The type inference in naru occurs when one uses a placeholder type _:

x: Int := 42
y: _ := x + x         -- the type of y is Int
z: _ := x ** y        -- the type of z is Real (since y can be negative)

This type is also implicitly assumed for the following cases:

This placeholder type has an important role for type-defining uniform declarations like X(a:Int) := class ... end. In this case, the resulting return type should match with the type defined in this very declaration (therefore X(a:Int)->Y := class ... end is rejected) and the inferred type is handy.

The type inference in naru is based on Hindley-Milner algorithm, which constructs a list of constraints in order to determine the actual type for placeholders. This algorithm always terminates (“decidable”), although it may take lots of time to finish the algorithm for pathological code. The compiler supports an option in order to limit the degree of type inference.

TODO boundary of type inference (maybe a module boundary) and type inference for open types

Pattern Matching

The naru type system primarily concerns with named types (i.e. nominal type system), but it is often convenient to use structurally defined types. For example the binary tree is either a leaf or a branch, where the branch contains two smaller trees and one value. It would be implemented in naru as follows:

Tree{+T} := trait
    size()->Int := _
end

Leaf{+T}() := class <- Tree{T}
    size() := fun -> 0
end

Branch{+T}(left:Tree{T}, value:T, right:Tree{T}) := class <- Tree{T}
    size() := fun -> left size() + right size() + 1
end

Readers familiar to data structures and algorithms could immediately figure out how the insertion or deletion is implemented with this structure. Unfortunately such operations become very complex to implement correctly. For example, even the tree rotation (red-black trees rely on this operation extensively) becomes cumbersome:

rotateRight{T}(node:Tree{T}) := fun
    use static
    if node isa(Leaf) then raise ValueError("the top node is not a branch") end
    top = node as Tree{T}
    if top left isa(Leaf) then raise ValueError("the left node is not a branch") end
    left = top left as Tree{T}
    return Branch(left left, left value, Branch(left right, top value, top right))
end

rotateLeft{T}(node:Tree{T}) := fun
    use static
    if node isa(Leaf) then raise ValueError("the top node is not a branch") end
    top = node as Tree{T}
    if top right isa(Leaf) then raise ValueError("the right node is not a branch") end
    right = top right as Tree{T}
    return Branch(Branch(top left, top value, right left), right value, right right)
end

This code uses too much code to check if given tree is of a particular form and to narrow the compile-time type of the tree accordingly. The primary goal of pattern matching is to specify this using a structural way and thus to remove such boilerplate codes:

rotateRight{T}(node:Tree{T}) := fun
    given node
    case Branch(Branch(child1, value1, child2), value2, child2) then
        return Branch(child1, value1, Branch(child2, value2, child2))
    else
        raise ValueError("cannot perform the tree rotation on this tree")
    end
end

rotateLeft{T}(node:Tree{T}) := fun
    given node
    case Branch(child1, value1, Branch(child2, value2, child2))
        -- "then" after the case speficiation is optional.
        return Branch(Branch(child1, value1, child2), value2, child2)
    else
        raise ValueError("cannot perform the tree rotation on this tree")
    end
end

It is much easier to spot what’s going on, and in particular it is now clear that the tree rotation does not affect the in-order traversal. Locally defined variables childN and valueN are only available within the corresponding case block.

case-ed Classes. naru issues an error for missing case clauses against the statically given type: if we omitted the else branch in the earlier code, we will miss the case that node is just a Leaf (for instance). For the same reason, the following code raises a compile-time error:

TreeReduction{T,U} := trait
    reduce(node:Tree{T})->U := _
end

TreeSum() := class <- TreeReduction{Int,Int}
    reduce(leaf:Leaf{Int}) := fun -> 0
    reduce(branch:Branch{Int}) := fun ->
        self reduce(branch left) + branch value + self reduce(branch right)
end

It is not immediately obvious what’s wrong with this code, but the problem becomes clear if we add a subclass Trident{T} (maybe for 2,3-tree) to Tree{T}: TreeSum reduce does not have an implementation for Trident! Therefore we need some way to say that Tree{T} is not open for direct subclassing, which is achieved by case-ed classes.

Tree{+T} := trait
    size()->Int := _

    case Leaf{+T}() := class <- Tree{T}
        size() := fun -> 0
    end

    case Branch{+T}(left:Tree{T}, value:T, right:Tree{T}) := class <- Tree{T}
        size() := fun -> left size() + right size() + 1
    end
end

case specifier in the trait or class definition is similar to global in such that the identifier is declared outside of the current scope, but also signifies the enumerated cases are all possible direct subtypes of this trait or class. Therefore aforementioned Trident{T} subclass is no longer possible with this definition. It is also possible to nest case-ed traits or classes.

case Specification for Clauses. The case specification is a superset of the type specification, which means that the trait or ordinary types can also be used for cases. It is even possible to use the dynamic type *, which will catch every uncaught types. (In this regard the placeholder _ is exactly same to the dynamic type in the case specification.)

The case specification extends the type specification by adding the following expressions:

Class deconstruction

Branch(l,v,r), Leaf() and so on. The empty constructor argument list is only used for verifying the given class has no constructor arguments.

Deconstructed names are available only within the following case block. Deconstructed names are very similar to function arguments: a named argument syntax (Branch(left:l, right:r, value:v)) can be used to reorder the argument order, and a placeholder (Branch(_,v,_)) can be used to ignore some irrevalent arguments.

Class deconstructions can be nested, but this does not mean that ordinary type specifications can also be used in the position of deconstructed names. For example, Branch(Branch(l2,v2,r2),v,r) is possible but Branch(Branch,v,r) is not (unless you want to bind Branch to the right subtree of the branch).

Name binding

Branch(l,v,r) as b, Int as n and so on.

This binding gives an additional name for the non-leaf node of class deconstruction or type specification, which is normally not available. It is useful for use static mode, since given-case construct does not automatically make the compiler to treat the given expression to the matched type. The binding may also be used for nested class deconstructions.

Explicit type variable

Enumerable{T} for T, ((Dest,Src)->Dest,Dest,Enumerable{Src})->Dest for Src,Dest and so on.

The explicit type variables can be used to give an additional name for types, and can only be used in the top-level of case specification. For other reasons, these type variables are treated as a placeholder _: Enumerable{T} for T will be treated as Enumerable{_} during the inference, for example.

Literal values

RedBlackBranch(l,v,r,red:true) and so on. Literal values match for exactly same (i.e. == returns true) values. Such syntax is only allowed for Bool, Int, Char and String values for the clarity.

Yet Another Kind of Types

Yet another use of the word “type” in naru is, if you still recall, as a keyword. We have already seen that type attribute gives a compile-time type, but type can also be used for declaring another type:

MapFromInt{Value} := type <- Map{Int,Value}

To be exact, this does not declare another type but an alias to it. MapFromInt{String} is exactly same as Map{Int,String}, for example. This declaration is needed because one cannot put the static parameter for normal declarations like MapFromInt{Value} := Map{Int,Value}.

type declarations can also be used for declaring a structural type:

List{+T} := type (Nil() | Cons(head:T, tail:List{T}))
Weekday := type (Mon() | Tue() | Wed() | Thu() | Fri() | Sat() | Sun())

Note that the constructor argument is always required even if it is empty, which resolves the ambiguity in the case specification. (For example, is Mon in WeekDate(year,weeknum,Mon) a reference to the singleton Mon value or just another name binding?) There is an advanced way to eliminate these parentheses, which is used for true and false values—which is defined just as Bool := type (true | false).

More on Functions

Having explained about types, we can now put interest to advanced features of functions in naru.

Multiple Dispatch

TODO

Open Method

TODO

with Operator

While with blocks are used to finalize resources automatically, with within an expression works differently: it augments the meaning of a preceding expression.

"A(2,4) = ...#((2 ** 65536 with modulo: 1e20) - 3)" println()
         --> A(2.4) = ...45587895905719156733

with operator appends given arguments to the outermost function call or operator of given expression; in this case, exponentiation operator ** is the outermost and since 2 ** 65536 corresponds to #**#(2, 65536) (see Special Names and Complication) the actual function call would be #**#(2, 65536, modulo: 1e30). with operator has to be parenthesized unless it is the outermost operator for function arguments or the sole statement; you can also use non-keyword arguments like (2 ** 65536 with 1e20), or multiple arguments like (sin(30) with :degrees) with accuracy: 0.5(--ulp--), but they are not always recommended. For the infix operators, with can spread to multiple arguments of same precedence (e.g. a + b * c - d with e is same as #-#(#+#(a, #*#(b, c), e), d, e)) unless the term of same precedence is parenthesized.

It is mostly useful for options of operators (although normal function call can be augmented as well); the aforementioned a ** b with modulo: c is one example, which is same as a ** b % c but much more efficient when b is very large; many floating point operations have an explicit rounding argument, like a + b with :trunc (for rounding towards 0); the range with non-default step can be constructed by a..b with step: c; and so on.

with can be applied to a single term as well; in this case, it calls #with method of given value. For example, #with method of functions receive a named argument and returns a new function that calls the original function with given argument: thus log with base: 3 is equivalent to fun (x) -> log(x, base: 3). Depending on the use, it can be very powerful way to extend naru.

There are a few operators that cannot be augmented in this way. Operators that do not desugar into a function call (e.g. but operator) are obvious exceptions, and assignments and operation-assignments cannot be augmented as well. with itself can be augmented.

Automatic Function

So far, we have seen two ways for making function: function declaration (name := fun ... end) and anonymous function (fun (args) -> ...). There is the third way, an automatic function, that is an obvious but useful shortcut for an anonymous function:

[3, 4, 5] map(_ + 1) println()       --> [4, 5, 6]

The full list of shortcuts is as follows (the argument name is only provisional, and actual type does not reflect it):

Any arguments within an automatic function are evaluated at the time of function construction. So strictly speaking, (_ + 3 println()) and fun (x) -> x + 3 println() is not equivalent.

The automatic function has to be parenthesized like with operator, unless it is an outermost operator in a function argument or a sole statement (highly useless though). Therefore the previous use, map(_ + 1), is valid, but (_ + 1)(3) needs parentheses.

Type Adaptation

TODO

Coroutines

TODO

Tail Call

The tail call “optimization” (TCO) is a way to implement function call (often recursive) with no additional stack use. For example the following range function:

range(low, high, result=:[]) := fun
    if low >= high then return result end
    return range(low+1, high, result ~ :[low])
end

…can be efficiently implemented as a loop:

range(low, high, result=:[]) := fun
    while not low >= high
        result = result ~ :[low]
        low += 1
    end
    return result
end

For some cases, however, the compiler is able to automatically convert the former into the latter, which is far more efficient than the original version. In brief, the tail call that is a form of return somefunc(...) can be applied. There are some reasons to prefer the former as well; some algorithms are inherently recursive and the transformed code is often unintuitive. Therefore this kind of transformation would be useful if we use lots of them.

The problem is that the tail call optimization is not an optimization. It changes the semantics of the code, as it will change the space complexity; for example, if we have a tail call which is optimized away and we mistakenly make it a non-tail call then we might see the unexpected space complexity increase. I consider TCO as an example of fragile optimization due to this reason. (Some FP people might disagree on this subject, but they have to realize that the programming language is not a mere tool or inference rule but an interface between the machine and programmer.)

Thus instead of making every tail call optimized, naru will only optimize the tail calls which are explicitly specified to be optimizable. This is simple, as you simply have to use \return instead of return:

range(low, high, result=:[]) := fun
    if low >= high then return result end
    \return range(low+1, high, result ~ :[low])
end

(Think of \ as a stack flattening.) If \return statement is not a tail call (e.g. \return func(42) + 1) then it would be an error, so you can ensure that you have written a tail call and it will be optimized always. You have to use \return even when you don’t have the return value, that is, the function returns () implicitly.

Modular Programming

While a simple program fits in one file, most programs does not; they have to be organized somehow. In naru the module is a unit of program, and it can be imported by import statement:

-- constants.n
PI := 3.1415926535897932
E := 2.718281828

-- area.n
import constants *
radius = 42
#"area = #(PI * radius**2)" println()

In general the following kinds of import statements are allowed:

import constants                     -- imports a module value
import constants PI                  -- imports a single symbol
import constants (PI, E)             -- imports several symbols from one module
import constants PI as PI_16DIGITS   -- imports a symbol with an alias
import constants (PI as PI_16DIGITS,
                  E as E_9DIGITS)    -- imports symbols with corresponding aliases
import constants *                   -- imports every symbols from one module
import constants, math               -- imports multiple modules
import (constants, math)             -- ditto
import collection SkipList{Int} as IntSkipList
                                     -- imports a specialized symbol with an alias
import collection SkipList{Int}      -- this is not allowed
import collection SkipList           -- workaround: use this and refer `SkipList{Int}`

There is also a hierarchy of packages, which contain other modules or packages. For example, let’s assume the following hierarchy:

image/
    _.n
    readers/
        bmp.n
        png.n
        apng.n
        gif.n
    writers/
        bmp.n
        png.n
    pixmap.n
    color.n
    colormodel.n

As you can see _.n is a special file that is run before any import for modules in the image package. (_ is not allowed in the symbol name in general, so this module has the name of just image. Also _.n is optional; if it does not present, children can be still imported but the package itself cannot be imported.) Outside the image package, one can use import image readers bmp or import image colormodel to import bmp and colormodel module values respectively. Normal imports are absolute, which means import image readers bmp is equivalent in the image pixmap module as well. For relative imports one can use import self readers bmp (which finds ./readers/bmp.n); for parent packages one can also use import super readers bmp (which finds ../readers/bmp.n) or even import super super image readers bmp (which finds ../../image/readers/bmp.n), but I don’t recommend these.

Cyclic imports are allowed as long as there are at most one dynamic module (i.e. the module with assignment statements) in the cycle. So library writers are strongly encouraged to write no dynamic code in the library code unless there is a good reason. TODO mutability of modules

Finally, naru has a built-in package named naru. The initial runtime module that contains Int, String and so on is named naru core; every naru code implicitly imports naru core before the first line of it. (i.e. import naru core *.)

O Holy Unicode, plus I/O

If you are living in the United States, where you don’t need the freaky letters, you probably don’t have to use (or even didn’t heard of) Unicode. However alas, you are not alone in this world. People around the world uses lots of scripts and languages that make Unicode does matter.

Following the trend, naru has a native support for Unicode. Until now I didn’t explain how the user input is handled; I had to postpone the explanation since it does involve Unicode. The following is how the input is (properly) handled:

do
    ">>> " print()
    text = String readln(replace: "")
    if text == "" then retry end
but EOFError
    "You pressed Ctrl-D or Ctrl-Z!" println()
end

Yes, this is ugly, but all the user input is ugly. String readln reads one line from the standard input; the replace: "" means it should replace an erroneous character with given string (here, the empty one) when the line cannot be properly parsed as a Unicode string. If it is none then it raises the UnicodeDecodeError exception instead; you have to handle it at your own. The default is none.

You can read lines from the file likewise:

with f <- File("data.txt") open()
    text = String readln(f, replace: "?")
but EOFError
    text = ""
end

Or even with a for loop:

with f <- File("data.txt") open()
    for line <- f lines
        line println()
    end
end

As you can see the FileStream (the return value from File open method) value has lines wrapper, which is iterable (i.e. Enumerable{String}). But how about the binary file? The natural interface would be bytes instead of lines:

with f <- File("data.bin") open()
    for byte <- f bytes
        byte println()
    end
end

In this case, byte would be an integer ranges from 0 to 255 (for 8-bit bytes, as most computers do). But this means you cannot handle them as sequences as you can see only one byte at a time. What if we are handling zero-terminated bytes, for example?

with f <- File("data.bin") open()
    bytes = Bytes readz(f)     -- really, a shortcut for Bytes readuntil(f, 0)
end

You can see the new class here: Bytes (short for byte sequence). It is a lot like Vector{Int}, but it assumes all elements in it fit in the byte. Also it has several methods that deals with byte sequences, like readuntil and asciilower/asciiupper.

So naru has two string types: String for an ordinary Unicode string, and Bytes for a byte sequence. Both can be converted to each other using Bytes decode and String encode methods:

with f <- File("data.bin") openrw()
    current = f position
    bytes = Bytes readz(f)
    str = bytes decode("utf-8", replace: "?")
    f position = current
    str encode("iso-8859-1") printz(f)    -- overwrite
end

naru also provides literals for byte sequences, b"..." and x"...". The former may contain a raw byte like b"abc" but limited to the letters in ASCII, and the latter may contain hexadecimally encoded bytes like x"61 62 63".

Stream Infrastructure

The core abstraction behind I/O in naru is the stream, represented by the superclass Stream. It treats the input source (Reader) and output sink (Writer) as a stream of bytes, as the direct interface shows:

with f <- File("input") open()
    bytes = f read(512)
    if bytes length != 512 or bytes[510..511] != x"aa 55" then
        "Valid MBR" println()
    end
end

with f <- File("output") openwrite()
    f position = 0
    f write(bytes)
end

There is one important fact about the naru stream: unlike C’s fopen, it doesn’t distinguish the text mode and binary mode. In the other words every stream in naru is implicitly in the binary mode. Parsing and decoding the bytes from the stream is the job of String and Bytes, as they exactly know how to treat bytes as their own value.

TODO

Syntax Extensions

TODO

Syntax extensions are declared by compiler directives, statements starting with use keyword. Compiler directives themselves are further explained in the next chapter.

Brand New Tokens

If you want to add a new operator rather than extending the existing operators (which can be easily done by overloading #+# or so), then you need to introduce new tokens: either punctuations or textual keywords. This can be achieved by use token directive:

use token "~="           -- declares a new punctuation.
use token "unless"       -- declares a new keyword.
use token "[\\ \\]"      -- declares two new punctuations which are pairs to each other.
use token "` `"          -- declares one new punctuation which is a pair to itself.
use token "if"           -- error: token already exists.

As shown with declarations for [\ and \], tokens are given as a string which can contain any Unicode character. It is possible to declare operator by use token "\u2264", for example. If the token string contains two tokens separated by whitespaces it declares two (not necessarily different) tokens that are opening and closing tokens respectively.

There are some restrictions on available tokens:

TODO token remapping (and its use in the collision-safe syntax extension); removal (use no token ...); local token namespace??? OH NOES

Operator Definition

TODO moving towards general syntax augmentation.

Generic Literals

Sometimes you need arbitrary literals; a regular expression, the feature naru does not support directly, would be one example. However naru does not support arbitrary literals, because parsing arbitrary literals requires the capability of specifying lexical syntax which can be very complex. Therefore instead of supporting arbitrary literals, naru specifies a set of fixed but extensible generic literal syntax.

There are two forms of generic literals in naru:

The form and prefix uniquely determines the kind of generic literals in the given scope. The prefix can be a valid identifier, a non-negative integer (where 042 and 42 is regarded same) or an empty string. For identifiers the prefix is case-sensitive.

TODO generic literal contents, generic literal definition syntax (use literal ... maybe?)

Some generic literals are reserved by naru itself. In the following table literals reserved by naru core module are built-in, and additional modules should be imported to use others.

Syntax Module Literal Type Description
"..." naru core String Principal quoted string.
r"...", R"..." naru core String Raw string. No escape sequence recognized except for \", which prohibits the end of string but still included in the string as is.
b"...", B"..." naru core Bytes Principal quoted byte sequence. Can only contain ASCII letters and escape sequences.
br"...", BR"..." naru core Bytes Raw byte sequence.
x"...", X"...", bx"...", BX"..." naru core Bytes Hexadecimal-encoded byte sequence. Any non-hexadecimal character is invalid; any whitespaces are ignored, and there should be even number of digits.
2"...", …, 36"..." naru core Int Extended integer literal.
%(...), %q(...), %Q(...) naru core String Alternative quoted string.
%w(...) naru core PureVector{String} A vector of strings, delimited by one or more whitespaces.
%r/.../ naru text pattern Pattern A pattern value. Customarily delimited by /.

TODO

Deep Internals

TODO

naru provides compiler directives that controls the behavior of the compiler. Compiler directives are statements starting with use (which may be followed by no for negative meaning); the very first statement introduced in this document, use naru, is an example of compiler directive. They are comparable to #pragma in C/C++, in the sense that the directives that cannot be comprehended by the compiler are ignored. (Comprehensible but incorrect directives are not ignored though.)

use naru directive

Every naru program can contain the use naru statement as a first non-comment non-empty statement. This is a recommended way to distinguish naru program (rather than editor-specific lines, for example, like -- vim: syn=naru), though it is totally optional. Every trailing tokens after use naru are ignored as like other unrecognized directives in order to allow future extension.

Examples:

use naru                     -- valid
use naru revision 4          -- valid, two tokens ignored
use naru (revision 4)        -- valid, four tokens ignored
use naru ]revision 4[        -- syntax error; brackets still should match
use naru for good, not evil  -- valid, five tokens ignored (comma is a part of the statement)
use naru """what
            the
            hell?"""         -- valid, one token ignored

Note that use naru should be the first statement if used; later occurrences are invalid.

use encoding directive

use encoding directive alters the current lexer to read the naru source code (a Unicode string) using given character encoding. The change applys after the statement terminator of the directive (i.e. the first semicolon or newline sequence, where the latter can be up to two characters). The character encoding must be ASCII-compatible, meaning the byte from 0x20 to 0x7e should decode into characters from U+0020 to U+007E; therefore UTF-16 cannot be used, for example. The companion, use no encoding, sets the current encoding to the default one.

These directives can only be used in the top-level of the program, and the multiple use of them is strongly discouraged. In fact their single use is also discouraged, but there are some legitimate reasons to use them (albeit due to the unfortunate reason).

use encoding "cp949"            -- this comment is still encoded in the default encoding
use encoding "shift-jis";       -- this comment is encoded in Shift_JIS
do
    use encoding "koi8-r"   -- invalid, should appear in the top-level
    "blahblah" println()
end
use no encoding
-- now this comment is encoded in the default encoding again

The default encoding is UTF-8 (so the useless UTF-8 BOM will be safely ignored). If you are using a badass editor that does not support UTF-8, you can still prepend the use encoding directive. One another use case is to avoid non-ASCII characters for the international development, using use encoding "ascii".

use token, use syntax and use literal directives

TODO

use static and use dynamic directives

use static and use dynamic directives control how the type inference is performed. In short, when the explicit type is missing in the assignment or binding it is assumed to be the dynamic type (*) when use dynamic is enabled, or the placeholder (_) when use static is enabled. You can still use the static types when use dynamic is in effect, or the dynamic type when use static is in effect though. These do not affect the missing argument types (they are always assumed to be dynamic); only missing return types are inferred.

These directives affect the current block, and should be placed at the front of the corresponding block. The default is using use dynamic. The innermost directive is active, so use dynamic does have a use.

TODO putting as * in every subexpression may enable more dynamic behavior, but not sure that it IS really usable; if used, the directive name should be use more dynamic.

use runtime directive

TODO

use extern directive

TODO runs in the function scope:

#+#(lhs:SmallInt, rhs:SmallInt)->Int := fun
    lhsv: Int64 = lhs _value >> 1
    rhsv: Int64 = rhs _value >> 1
    result: Int64 = _
    use extern 'narurt.dll,__nr_SIplusSI'(lhsv, rhsv) as result
    return SmallInt(result)
end

Appendix: Rants

The memoranda in this section will be eventually merged into the relevant sections.

To-do List

Only issues that affects the whole document are listed; grep for TODO for local issues.

Missing documentation. Should write the following items somewhere…

Alternative import syntax. For consistency the following syntax is more desirable:

-- assuming "package module" contains "symbol1" and "symbol2" symbols...
mod := import package module         -- defines: mod symbol1, mod symbol2

But it seemingly defies the natural extension:

* := import package module           -- defines: symbol1, symbol2 (?)
_ := import package module           -- defines: module symbol1, module symbol2 (?)
sym1, sym2 := import package module (symbol1, symbol2)
                                     -- defines: sym1, sym2 (???)

The last syntax is especially worse than the original import package module (symbol1 as sym1, symbol2 as sym2), but it is not clear that it does matter.

Import rules for syntax extension. To avoid the conflict between same syntax declared in two modules, we should be able to import syntaxes selectively. Example:

import somethingawful "%r"                  -- the original %r/.../
import naru text pattern "%r" as "%re"      -- rename %r/.../ as %re/.../

Or should we have a distinct syntax for syntax declarations? We already have similar ones for operators (#+# etc.), but we don’t have a good alternative here.

Parametric modules. I hope…

More selective module imports. E.g. import naru core >= 3.1 or import naru core * but String. Syntax TBD.

Explicit memory layouts. naru ext Struct should be a base class of all memory-layout-sensitive classes. E.g.

Struct_tm(tm_sec, tm_min, tm_hour, tm_mday, tm_mon, tm_year,
          tm_wday, tm_yday, tm_isdst) := class <- Struct
    var tm_sec: Int32
    var tm_min: Int32
    var tm_hour: Int32
    var tm_mday: Int32
    var tm_mon: Int32
    var tm_year: Int32
    var tm_wday: Int32
    var tm_yday: Int32
    var tm_isdst: Int32
end

Therefore the class reflection should preserve the member order. Padding should be specified manually (e.g. _: Int16), and every non-memory attributes should be specified as an attribute. (The latter would be no problem when the getter-only attribute can be easily defined.)


ikiwiki를 씁니다.
마지막 수정