This is a draft tutorial and design document for naru programming language, a general-purpose programming language designed by Kang Seonghoon. This article assumes the development version of naru, revision 4, and thus is subject to change.
This document is annotated in the separate rationale document, and a link to the justification or rationale will appear like this. (Note: these links won’t display properly in the wiki version.)
This document is available as both a standalone document and a wiki page. The recent changes are available in Mercurial repository. 이 문서의 한국어판은 여기서 볼 수 있습니다.
Let’s start with a simple example.
use naru
"hello world!" println()
use naru
is a (optional) marker to signal a naru code. "hello world!"
is a string literal, and println()
method prints given value. Normally newline ends the statement, but you may use ;
to separate them manually.
naru is an imperative programming language; you can assign and use variables much like most other languages.
nirvana = () -- "null" value
sunny = true -- boolean
radius = 42 -- integer
almostpi = 3.1415926535897932 -- real number
something = 4 + 5j -- complex number (with a trailing 'j')
myname = 'Kang Seonghoon' -- string (well, quotes does not matter here)
haiku = """
古池や ふるいけや
蛙飛込む かわずとびこむ
水の音 みずのおと
""" -- multiline string using triple quotes
Here --
introduces a short comment that ends in front of newline; comments are ignored silently. There is also a long comment delimited by (--
and --)
. By convention we will illustrate the output of the preceding line after -->
.
The last statement shows the example of multiline string, which should be delimited by triple quotes or newlines have to be escaped otherwise. It also shows the use of Unicode in naru: every naru code is read as Unicode text.
There are several operations built in naru; among them are arithmetic operations:
(15 + 21) println() --> 36
(15 - 21) println() --> -6
(15 * 21) println() --> 315
(15 / 21) println() --> 5/7
Wait, a fraction? Yes, naru has a built-in fraction. Dividing an integer by other integer may yield a fraction. naru also has separate operators for quotient (i.e. floor division) and remainder:
(49 // 21) println() --> 2
(49 % 21) println() --> 7
If you really want to see a real number, make either the divisor or dividend a real number. Anything contains a decimal (.
) is assumed to be a real number. Also Real from(...)
will convert an integer into a real number.
(49.0 / 21) println() --> 2.333333333333
(Real from(49) / 21) println() --> 2.333333333333
Integers in naru have arbitrary precision, and are only bounded by the memory limit.
(2 ** 88) println() --> 309485009821345068724781056
For strings naru provides a concatenation operator, but it has a somewhat strict idea about what can be concatenated.
("Hello, " ~ "world!") println() --> Hello, world!
("pi is about " ~ 3.14) println() -- error
("pi is about " ~ String from(3.14)) println() --> pi is about 3.14
Since it is common to concatenate lots of non-string things naru has a shortcut: an interpolated string.
pi = 3.1415296535897932
#"pi is about #(22/7), and almost equals to #pi." println()
--> pi is about 22/7, and almost equals to 3.1415296535897932.
Often we have to make a decision, and naru has a conditional construct for exactly this purpose. Comparison can be made as usual. (Just keep in mind that =
is an assignment and ==
is an equality; =
in the condition is forbidden.)
age = 15
if age == 0 then
"Welcome to the cruel world!" println()
elseif age < 13 then
"You are a child." println()
elseif age <= 18 then
"You are a teenager." println()
else
"You are an adult." println()
end
elseif
and else
parts are not mandatory, and you may drop them if not needed. So does then
keyword, unless it is followed by other statement.
age = 15
if 13 <= age <= 18
"You are a teenager." println()
end
Here 13 <= age <= 18
has a usual meaning; it returns true
when age
is between 13 and 18, inclusive. It won’t be parsed as (13 <= age) <= 18
or similar. The following is also equivalent:
age = 15
if 13 <= age <= 18 then "You are a teenager." println() end
There is also a then
-else
expression which can be used for an inline conditions:
(13 <= age <= 18 then "You are a teenager." else "You are a stranger.") println()
This expression should be enclosed in the parentheses (which include ones in the function call), since otherwise it can be mistaken for if
-then
statement.
Before talking about loops, let me introduce several kinds of complex values that are constructed from other simpler values. They may look familiar to you, but I remind the characteristics of them one more time:
The good analogy would be the address book in your cellphone: you store each entry as a tuple, which are stored in the map in order to speed the lookup (since you normally find the entry by the name), and entries that are recently called are also stored in the set (if you don’t care about the order) or the vector.
today = (2010, "Nov", 6) -- tuple
numbers = [0, 1, 2, 2, 3, 3, 3, 3] -- vector
genders = {"male", "female"} -- set
countries = { -- map
"KR" => "Republic of Korea",
"JP" => "Japan",
"US" => "United of States", -- look at this stray comma!
}
As in the last statement every newline between a pair of matching parentheses, braces or so is ignored. Also a stray comma will be ignored, so it is handy to write a long vector, for example.
A loop builds from a corresponding complex value. For example, you may want to loop from values of numbers
so print them:
for num <- numbers do
num println()
end
You will get a pair of key and value if you loop from the map. (Also note that do
is optional, as like then
in the conditional.)
for code, name <- countries
code println()
name println()
end
“It is okay that every loop requires a complex value, but what if I want to write numbers from 0 to a billion?” No problem, you have two ways to do it:
-- one way
for num <- 0..1_000_000_000
num println()
end
-- another way
num = 0
while num <= 1_000_000_000
num println()
end
An underscore (_
) within a long number is ignored. The second way shows a while
loop, the most general form of loop, but it is quite cumbersome to write. Use it only when there is no other way. (For paranoids: 0..1_000_000_000
does not store all the numbers between them, so it is safe to use lots of them.)
The basic naru above is just fine if you use naru as a calculator, but sometimes we want some kind of structure in the program. A major building block in naru is a function.
double := fun (x)
return x * 2
end
You can see a new operator here: :=
. It is called binding, compared to assignment (=
), and it means double
won’t change whatsoever. It is not limited to functions; you may use :=
to declare constants.
PI := 3.1415926535897932
It is not required but recommended that every constant name is written in uppercase. Likewise a function name does not start with a uppercase letter to distinguish itself with constants.
Back to the function, fun ... end
makes a new function. In this case it has one argument, x
, and it will return x
times 2 as expected. Since this function is so short you can use a shortcut:
double := fun (x) -> x * 2
This seems to mean that double
itself is not a function, but it contains a function value. It really does, but in most cases we don’t care and call it just a function double
. The alternative syntax supports this intuition:
double(x) := fun
return x * 2
end
Or more shorter:
double(x) := fun -> x * 2
To distinguish this uniform declaration syntax and the function value, fun
should be followed by a parenthesized argument list for function values and nothing for declaration. We will look at lots of variations on this syntax later.
Functions can recurse into itself…
fibonacci(i) := fun
if i == 1 then return 1
elseif i == 2 then return 1
else return fibonacci(i-1) + fibonacci(i-2)
end
end
for i <- 1..10
#"fib(#i) = #(fib(i))" println()
end
Or can be passed to the other function…
greeting(msg, decoration) := fun
decoration(msg) println()
end
nope(msg) := fun -> msg
squared(msg) := fun -> "[" ~ msg ~ "]"
greeting("Hello, world!", nope) --> Hello, world!
greeting("Konnichiwa!", squared) --> [Konnichiwa!]
Or even can be returned by the other function.
twice(f) := fun
return fun (x) -> f(f(x))
end
fourtimes := twice(twice)
sixteentimes := fourtimes(fourtimes)
sixteentimes(double)(1) println() --> 65536
As you can see fun ... end
thing is not required to appear right after :=
. And while the code fun (x) -> f(f(x))
itself is fixed, the resulting function will be depend on f
and will vary even outside of the defining function. It is called lexical scoping in PL jargon. (See also the difference of function declaration and function value?)
You may mark some arguments optional, by providing default values for them. It is a good idea to provide reasonable defaults for rarely used arguments.
splitstring(string, delim = " \t\r\n", preservedelims = false) := fun
-- ...
end
Like a difference between assignment and binding, =
and :=
have different meanings in default values; the former evaluates the default argument before the actual call (but not in the context of the call site, of course) and the latter evaluates it after the definition. This distinction will be helpful when the mutable value is used as a default value, but for now, just think that :=
is only for special cases.
If you want to specify some (but not all) default arguments, you may use the keyword arguments in the call site.
splitstring("alpha beta", preservedelims: true) println()
You are not limited to optional arguments however.
splitstring(preservedelims: true, string: "alpha beta") println()
You may also declare a varadic-argument function by appending ...
to the last argument. This last argument will receive a tuple of remaining arguments.
printf(fmt, args...) := fun
-- ...
end
Sometimes you need arguments that can only be specified using keywords; for example you want to add an outfile
argument to printf
but not by breaking the backward compatibility. In that case you can append them after the varadic argument:
printf(fmt, args..., outfile = Console) := fun
-- ...
end
For consistency you may omit the name of varadic argument, which means it does not receive varadic arguments at all. (In this view you may think that , ...
is implied for every non-varadic functions.)
splistring(string, delim = " \t\r\n", ..., preservedelims = false) := fun
-- ...
end
In the reverse direction, you may provide multiple arguments to the function by using \
prefix (tuples will be explained later):
args = (99, "bottles of beer")
printf("%d %s in the wall\n", \args)
This “flattening” is possible in the middle of arguments, not only at the end. This feature is useful for redirecting function arguments to others.
naru supports some object-oriented programming (OOP). However we should be extra careful about the word object-oriented, since it is often vague and may imply several other features of which naru does not support (and is not willing to support). In fact I have not use the word object at all; instead I prefer the word value as it would not imply OOP, at least.
Anyway let’s move on. In naru you can define your own class:
Person(name, age) := class
var name
var birthday = Date today - Duration days(floor(age * 365))
teenager := _
teenager get() := fun -> 13 <= age <= 18
greet(lang='en') := fun
if lang == 'en'
#"Hello, #(self name)!" println()
elseif lang == 'ja'
#"Konnichiwa #(self name)-san." println()
else
"..." println()
end
end
self species := "Homo sapiens sapiens"
var self population = 0
self exterminate() := fun
self population = 0
end
end
hacker = Person('J. Random Hacker', 93/4)
Person population += 1
hacker species println() --> Homo sapiens sapiens
floor(hacker age) println() --> 23
hacker teenager println() --> false
hacker greet() --> Hello, J. Random Hacker!
hacker name = "Ichiro"
hacker greet('ja') --> Konnichiwa Ichiro-san.
Person exterminate()
Person population println() --> 0
Every class can have the following elements:
They are written after class
keyword (or after the name in the uniform declaration syntax); in this case Person
has two constructor arguments, name
and age
. The call to the class will create a new value with given arguments. They are accessible inside and outside the class with the normal attribute syntax (i.e. value attr
).
They follows class
keyword and constructor arguments (if any). They can be either concrete (i.e. class
) or abstract (i.e. trait
), and concrete base classes should be followed by corresponding constructor arguments.
They are written using var
keyword (this is also valid outside the class, but with slightly different meaning). While constructor arguments are read-only by default, mutable attributes can be modified inside and outside the class with the attribute setter syntax (i.e. value attr = newvalue
). If constructor argument and mutable attribute coincides, they are shared.
They are similar to mutable attributes but var
does not precede them. They become immutable once defined. TODO is there a difference between x = ...
and x := ...
here?
They are optional for mutable attributes that are shared with constructor arguments, and mandatory for other cases. They are evaluated when the constructor is called at the first time, when even self
is not available yet; this is why you should use age
instead of self age
in the initial values. (To clarify: By the way, you may use a plain age
in the method, but it won’t reflect the changes in the mutable attribute of the same name if any. Think a plain age
as arguments as in the function.)
It is possible that initial values are not specified, using a placeholder _
in place of initial values. It means that the subsequent subclasses should fill that default value somehow, and any concrete class that does not supply all required default value is invalid. Placeholders can also filled by submethods.
They are mostly same as a normal function definition using a uniform declaration, but self
is automatically available to the method body. Non-uniform declarations can be used to define methods as well, but self
won’t be added automatically in this case.
You may omit the method body using a placeholder _
just like attributes; it forces any concrete subclasses or submethods supply the actual method body.
Subattributes and submethods are additional informations for attributes and methods. They are defined as like other attributes or methods, but the name of a target attribute or method are prepended (e.g. greet doc := ...
). There is no inherent difference between subattributes and submethods, except for the definition syntax (subattributes are defined like attributes, submethods are defined like methods).
They can be used freely, but some subattributes and submethods are specifically recognized by the compiler:
doc
subattribute adds an inline documentation to the target.get
submethod gets executed when the placeholder _
is used to define the target and the access to that target is made. Therefore when SomeClass attr
is defined with a placeholder, SomeClass attr get()
(if any) will be called instead. It is also applicable to the methods; in this case it should return a function(!).set
submethod is similar to get
, but gets executed when the target is being written with =
operator. It should receive one argument (the return value is slightly discarded). Other kinds of assignments, besides :=
, are automatically translated to the assignment with =
.bind
submethod is similar to set
, but gets executed when the target is being written with :=
operator.These are same as attributes, methods, subattributes and submethods belonging to the instance (i.e. a value constructed from the class), but prefixed by the additional self
. They can be also accessed from their instances. For the class methods and submethods, self
inside them denotes the class value itself no matter where the method is called. (For the same reason, constructor arguments are not available in the class-bound ones.)
There is no strict concept of visibility in naru; you may prepend _
to the private name (_debug
, for example) but it won’t stop users from accessing them. However prepending _
itself is a good idea, as it will note users that the interface may change without a notice.
While the class definition looks similar to the function definition, the class definition cannot have arbitrary statements; if you want to do something complex in the constructor, define initialize
method:
NonNegative(value) := class
initialize() := fun
if value < 0 then
#"Warning: #value is less than zero." println()
end
return self
end
end
NonNegative(4) println() --> <NonNegative @b0d3a0>
NonNegative(-4) --> Warning: -4 is less than zero.
Note the use of value
here; since we are talking about the constructor argument we don’t have to use self value
. In fact, the following classes are equivalent:
Date1(year, month, day) := class
-- redundant initializations are kept for the comparison purpose.
var year = year
var month = month
var day = day
end
Date2(year, month, day) := class
var year
var month
var day
initialize := fun
self year = year
self month = month
self day = day
end
end
Since we often want constructor arguments to be mutable attributes automatically, you may merge them like this:
Date(var year, var month, var day) := class
end
Class may be based on the other existing class:
Date(year, month, day) := class
weekday() := fun
leapyear(y) := fun -> y % 4 == 0 and (y % 100 != 0 or y % 400 == 0)
years = self year - 1
days = 365 * years + years // 4 - years // 100 + years // 400
if leapyear(self year) then
days += [0, 3, 3, 6, 1, 4, 6, 2, 5, 0, 3, 5][self month - 1]
else
days += [0, 3, 4, 0, 2, 5, 0, 3, 6, 1, 4, 6][self month - 1]
end
weekday = (days + self day - 1) % 7
return ["Mon", "Tue", "Thur", "Wed", "Fri", "Sat", "Sun"][weekday]
end
end
DateTime(year, month, day, hour, minute, second) := class <- Date(year, month, day)
end
DateTime(2010, 11, 7, 18, 40, 2) weekday() --> Sun
Here a value DateTime(y,m,d,h,n,s)
is equivalent to Date(y,m,d)
when the method in Date
is used. It is possible that DateTime weekday
is declared and overrides the definition of Date weekday
; in that case, DateTime weekday
can use super weekday()
to call the superclass’ weekday
method.
There can be two or more base classes; if omitted, as in Date
class above, then it is assumed to have Thing
as a base class.
TODO
Sometimes it is convenient to assume a class prototype that is never instantiated. For example, when you have Circle
, Square
and Rectangle
classes the Square
is clearly a subclass of Rectangle
but Circle
and Rectangle
should share some kind of base class for, say, area
method. Therefore naru supports an abstract class, or traits.
Shape := trait
circumference := _
area := _
-- most shapes are not rotationally symmetric...
rotational_symmetry := false
end
PI := 3.1415926535
Circle(radius) := class <- Shape
circumference := 2 * PI * radius
area := PI * radius**2
rotational_symmetry := true
end
Rectangle(width, height) := class <- Shape
circumference := 2 * (width + height)
area := width * height
end
Square(side) := class <- Rectangle(side, side)
rotational_symmetry := true
end
Traits are very similar to classes, but they lack constructor arguments. Traits may contain the concrete parts, to the extent that you can make a trait with everything is concrete so a subclass without body is valid. Traits are general choice for abstract classes and complex hierarchy, since they have no difficulty when dealing with the multiple inheritance.
Thing
, which is automatically used for classes without a base class specification, is a good example of traits. It nevertheless exposes several useful methods.
In PL jargon, polymorphism is just an ability to generalize several things into one thing. For example you can generalize classes by using a base class; this is called subtype polymorphism. Another kind of polymorphism is deriving several similar classes from one template with parameters (typically types); this is called parametric polymorphism. naru supports both of them, but this is quite difficult matter to explain all the bit here.
(For paranoids: Yes, there is the third kind of polymorphism, ad-hoc polymorphism, which best example is C++’s overloading. For the purpose of this document, however, it is not considered polymorphism as it does not define one entity which behaves differently depending on the context; it instead defines multiple entities sharing one name. naru supports this notion nevertheless, using the union types.)
As a part of subtype polymorphism naru supports multiple inheritance. C++ folks should be scared here, but naru is more robust than C++ about multiple inheritance. First look at this example:
A(x, y) := class
end
B(x, y, z) := class <- A(x, y)
end
C(y, x) := class <- A(x, y)
end
D(x, y, z) := class <- B(x, y, z), C(y, x)
end
Then D(x, y, z)
is equivalent to B(x, y, z)
, C(y, x)
and A(x, y)
when using methods from base classes. Base classes cannot conflict each other; for example, it is error to change the definition of C
to C(x, y) := class <- A(x, x)
as it will imply x == y
in D
’s constructor. (You cannot use the arbitrary argument in base classes for the same reason.) Traits are free of this problem as they lack the constructor arguments.
Every conflicting attributes, subatributes or methods are reported:
A := trait
f() := _
end
B := trait <- A
f() := fun -> 42
end
C := trait <- A
f() := fun -> "some string"
end
D() := class <- B, C
-- error: f() is not defined here, so `B f` and `C f` will conflict
end
This diagnostics are applied equally to the types of attributes, methods, subattributes or submethods; if B f
and C f
narrow down the type of A f
, then the type of D f
should be narrowed down so that it is a subtype of types for both B f
and C f
. (See an intersection type for detailed rules.)
TODO overriding rules; this is too complex and has lots of corner cases. heck.
naru also supports parametric polymorphism, by using {}
before the constructor arguments: (For now, ignore unseen syntaxes here.)
LinkedList{+T}(head: T, tail: LinkedList{T}? = none) := class
print() := fun
self head print()
if next <- self tail then
", " print()
next print()
end
end
global String from(self) := fun
return "<LinkedList of type #T>"
end
end
LinkedList(3, LinkedList(4, LinkedList(5))) print() --> 3, 4, 5
Here {+T}
is a type parameter; it would be {Key, Value}
for the maps, for instance. But what about +
? It means you can use LinkedList{Int}
in place of LinkedList{Rational}
, for example. Since Int
would fit in all cases Rational
is needed, it makes sense that LinkedList{Int}
and LinkedList{Rational}
behaves equally. If you use -
it means the inverse; LinkedList{Rational}
can be used in place of LinkedList{Int}
. If there is no such marker, there are no such relations at all. In PL jargon: +
means the type parameter is covariant, -
means it is contravariant, and no marker means it is invariant.
In many cases you don’t have to figure out +
and -
(they are incorrect in many cases), but it does matter when you are writing very general class, like collections. You will certainly want the list of Int
to be used in place of the list of Rational
, right? However you should be careful, since mutable containers cannot be covariant. (In brief, getting the element requires a covariant type parameter for values and setting the element requires a contravariant one. In order to satisfy both it should be invariant.)
Using classes with type parameters is much easier than defining them. The last line in the code above is equivalent to this:
LinkedList{Int}(3, LinkedList{Int}(4, LinkedList{Int}(5))) print()
But since we already know that head
receives Int
and thus T
has to be Int
, we can omit all {Int}
here; you still have to give all type parameters if one or more type parameters cannot be inferred automatically, such as an empty linked list. (One alternative is to use type
keyword, as we will discuss later.)
Since T
is covariant, the following lines are equivalent:
LinkedList(4.0, LinkedList(3/4, LinkedList(5))) print()
LinkedList{Real}(4.0, LinkedList{Rational}(3/4, LinkedList{Int}(5))) print()
-- here LinkedList{Rational}(...) is used in place of LinkedList{Real}
For contravariant T
the similar equivalence will hold. In short, the type inference is affected by the variance of type parameters.
Type parameters in the class definition are treated equally to constructor arguments; you can use T
(or self T
) throughout the class. However you cannot declare it as a mutable attribute (of course!).
TODO type parameters on functions; in this case we may automatically infer +
and -
but it will be inconsistent with ones on classes…
TODO restrictions on type parameters (maybe {T <- U}
or so)
In most cases we have to deal with lots of possible errors. When we divide some number by the another number we may get the error (division by zero), when we open the file and read some lines we may get other errors (no file, permission denied, EOF reached, and even interrupted), and so on. naru supports exceptions to deal with them.
In naru exceptions are values of Exception
class, and can be raised by the raise
statement.
raise RuntimeError('alien invaded!')
They can be caught using a special but
block:
do
raise RuntimeError('alien invaded')
but Exception as e
#"exception caught: #(e message)" println()
end
More generally, every do
and fun
block can have but
, else
and finally
blocks which are executed when an exception of given type (if the type is given; every exception if not) is caught, when no exception is caught, and by the time when the execution flow escapes the block, no matter if the exception is caught:
do
-- ...
but ExceptionA as e
-- executed when ExceptionA or its subexception is caught
-- a new variable e is available in the scope
but ExceptionB
-- executed when ExceptionB or its subexception is caught
-- unless the exception was of ExceptionA class too
but
-- executed when an exception not of ExceptionA nor ExceptionB class is caught
else
-- executed when no exception is caught
finally
-- executed after `but` or `else` block got executed (if any)
-- runs in most cases, but may not execute when the program got the signal
end
In the but
block one may use a bare raise
statement to re-throw given exception. (This exception wouldn’t be caught by the subsequent but
block in the same block, of course.)
There is a shortcut to the big but
block; you may use but
expression when you have only one exception to handle. (While you can technically use ((... but ...) but ...) but ...
form, it is too verbose and do
block is better.)
(4/0 but ZeroDivisionError -> -1) println() --> -1
(4/0 but ZeroDivisionError as e -> e length sign) println() --> 1
(4/0 but 42) println() --> 42
with
BlocksOne variation on the but
block comes from the common resource management pattern. Let’s assume we have to write some file. The naive code would look like this:
f = File("forty-two") openrw()
"The answer to the life, universe and everything." println(f)
The more subject on the file I/O is available in the later chapter. For now we will use them without explanation. But are you sure that this code is correct? Should it be like this one?
f = File("forty-two") openrw()
"The answer to the life, universe and everything." println(f)
f close() -- MISSING!
In general we should close the file when we opened it. Of course it often gets automatically closed by the finalizer if the garbage collector is in the heavy duty, but we have to be sure…. Wait. What if we get interrupted while writing to the file?
f = File("forty-two") openrw()
do
"The answer to the life, universe and everything." println(f)
finally
f close()
end
This code will always close the file regardless of the exception, but is also very verbose. Since this pattern is common, naru provides a shortcut: scoped with
block.
with f <- File("forty-two") openrw()
"The answer to the life, universe and everything." println(f)
end
This with
block automatically creates the placeholder code with do
-finally
block. For the sake of generality it calls enter()
method at the start of the block body and exit()
at the end of the block body. (For File
values, exit
method is a synonym for close
.) Furthermore you can use other but
, else
and finally
(!) blocks in it:
with f <- File("forty-two") openrw()
"The answer to the life, universe and everything." println(f)
but Exception as e
-- this block will also get executed when File(...) call fails.
#"exception caught while writing the file: #(e message)" println()
finally
-- this block executes before the exit method.
"the file is about to be closed!" println()
end
One final nicety for using with
block is that the scoped variable is automatically removed after the block. This reduces the confusion due to the duplicated variables.
f = 42
with f <- File("forty-two") openrw()
"The answer to the life, universe and everything." println(f)
end
f println() -- 42
Someone may wonder that we have seen various built-in values but we don’t know about their class. In fact, they do have their classes; you can look up their classes by class
attribute. (It is one of a few keywords permitted in the attribute syntax.)
() class println() --> <type ()>
true class println() --> <type True>
false class println() --> <type False>
42 class println() --> <type SmallInt>
(4/7) class println() --> <type Ratio>
3.14 class println() --> <type Decimal>
(3 + 4j) class println() --> <type CDecimal>
?x class println() --> <type Char>
"string" class println() --> <type String>
[1, 2, 3] class println() --> <type Vector{Int}>
{1, 2, 3} class println() --> <type Set{Int}>
{0=>'F', 1=>'T'} class println() --> <type Map{Int, String}>
(3, false, 'foo') class println() --> <type (Int, False, String)>
42 println class println() --> <type ()->()>
abs class println() --> <type (Int)->Int | (Rational)->Rational | (Real)->Real | (Complex)->Complex>
You can see some unseen names like SmallInt
, Ratio
, Decimal
and CDecimal
instead of more familiar Int
, Rational
, Real
and Complex
. This is because that’s how they are internally implemented; see the detailed hierarchy for explanation. In the other hands the last example shows several overloaded function types, but we will ignore them for now.
Another question arises here; how about operators like +
, *
and ~
? In fact, they are normal functions and you can declare them with a special name: #+#
, #*#
and #~#
.
Gaussian(re, im) := class <- CDecimal(re, im)
initialize() := fun
if re != floor(re) or im != floor(im)
raise ValueError("Gaussian number requires integer arguments.")
end
self
end
global #**#(self, other:Int)->Gaussian := fun
if other < 0 then
return 1 / self ** (-other)
elseif other > 0 then
-- binary exponentiation
half = self ** (other // 2)
if other % 2 == 0 then
return half * half
else
return half * half * self
end
else
return 1
end
end
end
As well as #**#
, you can see several new syntaxes here:
global
. Unlike some other languages, global
means the variable should live in the outer mutable scope, i.e. the innermost scope that contains the scope in which the variable is defined and is not introduced by class
keyword (which is immutable, by the way). Well, global
may be used in functions too (why not?). (For paranoids: You now may wonder that the global
syntax is redundant in this case, as one can always define #**#(lhs:Gaussian, rhs:Int)->Gaussian
function in the global scope. It is not quite right because it is still a specification for Gaussian
class. For example, if the method body was _
then any subclass still has to supply the method body for that global
method.)#**#
—we already has lots of them from Int
, Rational
, Real
etc.—and we need to specify exactly when the function is used.self
, when the function is in the class definition. This means the function receives that argument normally, but inside the function it will be used in place of self
(so we can use them as much like an ordinary method).Therefore, the following expression…
Gaussian(3, 4) ** 5
…is equivalent to the following…
#**#(Gaussian(3, 4), 5)
…and it will call #**#
declared in the Gaussian
class, as expected. Likewise most operators and syntaxes in naru can be rewritten in functional form:
Operator | Functional |
---|---|
op b |
b op#() |
a op |
a #op() |
a op b |
#op#(a, b) |
a op= b |
a = a #op=#(b) |
a op1 b op2 |
a #op1#op2(b) |
a op1 b op2 = c |
a #op1#op2=#(b, c) |
a op1 b op2 := c |
a #op1#op2:=#(b, c) |
For example, the function call a(b)
is equivalent to a #(#)(b)
. op=
case is a bit important as you have to return self
(or the new value, if your class is immutable) or you will get the error. Also you cannot override #{#}=#
, #{#}:=#
, #(#)=#
and #(#):=#
due to the syntactic ambiguity.
Since everything in naru (and even a class) is a value, and since every value has class
pseudo-attribute, everything in naru is of some type. (naru does distinguish two terms, “type” and “class”, but at least for this section we will use the term “type” to denote a value returned by class
attribute.) Types in naru form the full hierarchy as like this:
Value
|- Function
| +- ... (includes Function1{-T1,+U}, Function2{-T1,-T2,+U} etc.)
|- Tuple
| +- ... (includes... TODO)
+- Thing
|- Type
|- Ordered
| |- Bool
| | |- False
| | +- True
| |- Number
| | +- ... (includes Complex, Real, Rational, Int etc.)
| |- Char
| |- Date
| | +- DateTime
| |- Time
| | +- DateTime
| |- Week
| +- Duration
|- Enumerable{+T}
| +- ... (includes Range{+T}, Vector{T}, String etc.)
|- Stream
| |- Reader
| | +- Console
| +- Writer
| +- Console
|- File
+- BaseException
+- Exception
Basically all value is of Value
class, and all “normal” values besides from functions and tuples are of Thing
class. Every ordered (more exactly, partially ordered) values are of Ordered
class. Every exceptions are subclasses of BaseException
class, but since some built-in exceptions are used to implement language-specific situations (such as SystemExit
) you’d better use Exception
instead.
Number
and Enumerable
have big hierarchies themselves, which are omitted here; we will look at them later.
TODO TypeName from
multimethod
A tuple in naru contains the fixed number of values with different types. Tuples are very similar to immutable collections, but collections in principle contain arbitrary number of values with a same type so these two are of different uses. This can be also verified from the types of tuples and collections: a tuple of three numbers have a type with exactly three types within it ((Number,Number,Number)
) but a vector doesn’t (Vector{Number}
).
Tuples are constructed by comma operator: 1, 2, 3
or (1, 2, 3)
. Parentheses are optional but sometimes required to resolve an ambiguity (e.g. f(3, 4, 5)
is a function call with three arguments but f((3, 4, 5))
is a function call with one tuple argument). Assignment operator (=
etc.) and return
statement, among others, are bound more loosely than comma, so you can use the code like a = 3, 4, 5
(same for a = (3, 4, 5)
) and return 3, 4, 5
(same for return (3, 4, 5)
).
0-tuple, or a degenerate tuple with no values, is written ()
. It is a default return value for functions without return
statements. There are no 1-tuple unlike many other languages, including Python.
Tuple unpacking. Tuple syntax can also be used in the left hand side of assignment, meaning the right hand side should be a tuple (or similar things; will be explained later) and multiple assignments will be occurred. For example:
a = 3, 4, 5
b, c, d = a
#"b=#b c=#c d=#d" println() -- b=3 c=4 d=5
b2, c2 = a -- error!
0-tuple, ()
, can be used in this way too. In this case the right hand side should be ()
, which is useful for verifying the function call not to return any (unexpected) meaningful value—using a function call as a statement does not check this, throwing out any return value.
() = foo()
To ignore one or more values in the tuple use an _
placeholder identifier in place of variables. A special identifier _
is reserved in naru, so variables cannot use that name.
first := fun (t); a, _, _ = t; return a; end
second := fun (t); _, b, _ = t; return b; end
third := fun (t); _, _, c = t; return c; end
first((3, 4, 5)) println() -- 3
You can also use an ellipsis (...
) to get the remaining part of values in the tuple; this is alike a function call.
a, b, c... = 3, 4, 5, 6, 7 -- c will get (5, 6, 7)
d, e, f... = 3, 4, 5 -- f will get 5
g, h, i... = 3, 4 -- i will get ()
-- 1-tuple (i.e. non-tuple) can be unpacked as well.
x, y... = 1 -- x=1, y=()
_
and ...
can be used altogether, so the above first
, second
and third
can be rewritten in the generalized version (do not require 3-tuple):
first := fun (t); a, _... = t; return a; end
second := fun (t); _, b, _... = t; return b; end
third := fun (t); _, _, c, _... = t; return c; end
first(3) println() -- 3
first((3, 4)) println() -- 3
first((3, 4, 5)) println() -- 3
third((1, 2)) println() -- error (of course)
Individual item access. Tuples have a set of special attributes, namely from #0
to #n
where n
is a number of values in the tuple minus 1. These are a shortcut for aforementioned first
, second
and third
functions.
-- this code...
c, b, a = t
-- ...is equivalent to this (except for type checking):
c = t #0
b = t #1
a = t #2
For the uniformity #0
also works for non-tuples; therefore for any non-tuple value v
, v #0
and v
is exactly equivalent. There is also a tuplelength
attribute which gives the number of values in that tuple, or 1 when it is not a tuple (also for the uniformity).
Other tuple operations. A concatenation operator, ~
, also works for tuples:
((1, 2, 3) ~ (4, 5, 6)) println() -- (1, 2, 3, 4, 5, 6)
A flattening prefix, \
, can also be used in the tuple constructor:
a = (2, 3)
(1, \a, 4) println() -- (1, 2, 3, 4)
They are quite picky about operand types, however, so one has to be careful when using the static types. For example, this code won’t compile:
f(t, u) := fun
use static -- ...so type inference will be used throughout this function
a, b, c... = t
-- t should have a type of (Ta, Tb, \Tc)
d, e = u
-- u should have a type of (Td, Te)
-- u ~ t == (d, e, a, b, \c), and has a type of (Td, Te, Ta, Tb, \Tc)
(u ~ t) println()
-- t ~ u == (a, b, \c, d, e) but \c is invalid since c's length is not fixed!
(t ~ u) println() -- error!
end
There are two truth values (“Boolean”) in naru: true
and false
. If needed, the third truth value, “maybe” or “unknown”, can be simulated using Interval{Bool}
and Option{Bool}
respectively. When the boolean value is required as a condition in naru, the given value will be automatically converted to the booleans using Bool from
conversion method. (This, of course, does not mean that boolean function arguments automatically converts the values.)
naru contains two sets of boolean operators: one is short-circuiting and the another is not. The former is named and
and or
; the latter is named &&
and ||
. Well, there is also a unary operator not
for usual purpose. Anyway the former variant receives any values (not only booleans) and converts the first operand to determine whether to evaluate the second operand or not; as a result, and
returns the first non-true value (or the last value if none) and or
returns the first non-false value (or the last value if none). The latter variant also receives any values, but always evaluates and converts operands to booleans to get the result; therefore their return value will be always boolean.
(3 and 0 and false) println() -- 0
(3 and false and 0) println() -- false
(3 and 4 and "!!!") println() -- !!!
(() or "hello" or "world") println() -- hello
(() or "" or 0.0) println() -- 0.0
(0 && "foo") println() -- false
(0 || "foo") println() -- true
The built-in values in naru typically converts to true
, but some values converts to false
instead:
()
0
, 0.0
, -0.0
, 0j
etc.)empty
attribute equals to true (""
, []
, {}
, {/}
etc.)Ordered
TypesThere are four kinds of comparison possible in naru: the value equality (==
and !=
), the referential equality (is
and is not
), the ordering (<
, <=
, >
and >=
) and the rich ordering (<=>
).
Value Equality. These operators compare two given values using the common sense, so 3 == 3
and 3 != 4
. Equality operators are available in not only numbers but every values, and their default behavior is same to the reference equality. This means that a == a
holds for almost every a
except for a few cases: one prime exception is a Not-a-Number (NaN) value in the floating point number, where NaN differs from any NaN.
While these operators are intended to implement the common sense, it is possible that two equal values a
and b
do not compare identically. It is quite possible when the values are mutable (like opened files), or when the exact comparison is very costly so only the approximation is implemented (like symbolic expressions).
Referential Equality. These operators returns true if and only if modifying one value in-place also modifies other value directly. For example, [1, 2, 3] == [1, 2, 3]
(as two vectors contain same values) but [1, 2, 3] is not [1, 2, 3]
(as modifying the former does not change the latter). For numbers the referential equality and the value equality are guaranteed to be identical, except for NaNs where two NaNs with the same representation will be considered referentially identical.
Ordering. This is a hard part. The general ordering is available to every values of Ordered
class. It is highly recommended that a <= b
is identical to (a < b) || (a == b)
(and vice versa) and a < b
is identical to b > a
(and vice versa), but these recommendations are not enforced.
For the real numbers, these operators return false when at least one operands are NaN. (For paranoids: they are not signaling in the IEEE 754 terminology.) In order to check this strictly use :strict
argument: #<#(3, Decimal NaN, :strict)
will raise the exception, for example. (See with
operator for the shortcut.) For the ordered collections (i.e. Vector
and SortedSet
are ordered but Set
is not ordered) the comparison is done lexicographically: [3, 4] < [3, 4, 5] < [3, 5] < [4]
. Of course the values themselves have to be ordered in order to use the comparison.
As mentioned earlier a <= b < c
is not same to (a <= b) < c
nor a <= (b < c)
. It is much more like (a <= b) && (b < c)
, except that b
will be evaluated only once. This desugaring also means #<#
, #<=#
, #>#
and #>=#
functions still receive exactly two arguments, no matter the comparison is chained or not.
Rich Ordering. Implementing #<#
, #<=#
, #>#
and #>=#
separately is cumbersome and error-prone, so naru provides a single comparison operator <=>
. This rich comparison operator works as if (a <=> b) op 0
is equivalent to a op b
(where op
is one of six valid normal comparison operators). By default ordering and equality operators are implemented using the rich ordering, so usually you just have to implement one function to implement the comparison.
<=>
usually returns 0, -1 or +1 depending on the comparison result, but this is not always required. TODO no, <=>
may have to behave as like IEEE 754’s totalOrder
function (except for -0
and +0
distinction?)
Character type in naru, Char
, provides an abstraction on a conventional text. In order to support most languages and scripts uniformly, naru uses Unicode as a basis of text representation; therefore one Char
value encodes one Unicode code point. A code point encodes a single abstract character, but this does not necessarily map to a conventional character (termed grapheme for clarity). It is quite common that one grapheme requires multiple code points to represent: for example, J with a caron, i.e. J̌, requires two code points, U+004A and U+030C. In naru the basic unit of text manipulation is still a code point as it is easy to implement and handle, but text strings also provide some operations on grapheme units.
A Char
literal typically consists of ?
and a single code point, like ?a
and ?ㅋ
. ?
can be followed by two or more letters or a bare string literal; in this case it indicates the name of characters. For example, ?space
maps to ?
(?
followed by a space) and in fact prefered over the latter; likewise ?"HANGUL SYLLABLE BAB"
maps to ?밥
, which is nevertheless allowed but somewhat harder to understand for non-Koreans. The names of characters are taken from Name
and Name_Alias
properties of Unicode Character Database (UCD), and also from Unicode_1_Name
properties of UCD for characters U+0000..001F and U+007F..009F. (For paranoids: These control characters do not have a formal name since Unicode 2.0.) Names are case-insensitive, and whitespaces, underscores and hyphens between two letters are mostly ignored, as per Loose Matching rule 2 (LM2) of UAX #44. (For paranoids: A single exception to this rule is U+1180 HANGUL JUNGSEONG O-E, which will be confused with U+116C HANGUL JUNGSEONG OE.) Every character is also a name of itself, so ?"a"
equals to ?a
.
Char
literal can be also composed of ?
and a single escape sequence, starting with \
. (The escape sequence can be also in the string, but technically this is not a part of character literal.) An escape sequence can be one of these:
\
followed by one to seven decimal digits, e.g. \44032
(for U+AC00). The code point above 1114111 (i.e. 0x10FFFF) is invalid.\x
followed by exactly two hexadecimal digits, e.g. \xA0
(U+00A0).\u
followed by exactly four hexadecimal digits, e.g. \uac00
(U+AC00).\U
followed by exactly eight(!) hexadecimal digits, e.g. \U00012345
(U+12345). The code point above 1114111 is of course invalid.\x
, \u
or \U
followed by {
, one or more hexadecimal digits and }
. E.g. \U{a0}
(U+00A0), \u{306}
(U+0306) and \x{12345}
(U+12345). This is useful when you can’t distinguish where the escape sequence ends.\a
(U+0007), \b
(U+0008), \e
(U+001B), \f
(U+000C), \n
(U+000A), \s
(U+0020), \r
(U+000D) and \t
(U+0009) for common control characters.\\
, \"
and \'
gives a \
, "
and '
respectively (for the obvious reason).These escape sequences can also be used in the string literal as well, with additional escape sequences.
?a println() -- a
?: println() -- :
?sp println() -- (U+0020)
?AmperSand println() -- &
?\\ println() -- \
?\42 println() -- * (U+002A)
?\uD7A3 println() -- 힣 (U+D7A3)
?\U12345678 println() -- error (code point above U+10FFFF)
?'number sign' println() -- #
?" \78\85\77\66\69\82\83\73\71\78 " println()
-- # (the name reads " NUMBERSIGN ")
Char
class has an access to parts of Unicode Character Database whenever possible; for example, ?# name
will result in "NUMBER SIGN"
and ?# category
will result in :Po
(Punctuation, others). Char
s are ordered by their code points (although it is mostly useless), so String from(?a..?z)
will give all the lowercase Latin letters.
Code points for surrogate pair, U+D800..DFFF, can be represented by Char
but cannot appear in the text string. TODO really? there are lots of systems that (accidently) accept the lone surrogate pair character, like Twitter, and they have to be preserved for the interoperability.
The following is the full hierarchy of numbers in naru:
Number
+- Complex
|- Real
| |- Rational
| | |- Int
| | | |- BigInt
| | | |- SmallInt
| | | +- NativeInt
| | | |- NatNN
| | | +- IntNN
| | +- Ratio
| |- FloatNN
| |- Decimal
| +- Interval{+T}
|- CFloatNN
+- CDecimal
In short, there are four abstract classes (domains) in naru: integers (Int
), rational numbers (Rational
), real numbers (Real
) and complex numbers (Complex
). Others are concrete classes that implement numeric domains: for example, BigInt
and SmallInt
are actual concrete classes for integers, and the actual class of an integer literal is one of them. NatNN
, IntNN
, FloatNN
and CFloatNN
reflect the natively supported number types and suffixed by their bit length (e.g. Nat64
, Float80
); for native complex types the combined bit length of real and imaginary parts is used (e.g. CFloat128
consists of two Float64
).
Besides from corresponding domains, every numbers are either exact or inexact (as like Scheme). The exactness of given number can be retrieved by exact
attribute. This distinction is most clearly visible in real numbers, where Decimal
always keeps an exactness flag. (Integers are always exact by definition; FloatNN
and CFloatNN
are inexact due to their use of floating point representation.)
I will present the exact syntax of numeric literals hereby, albeit I believe almost everyone already has a good understanding about them.
0
, 1
, 2
, 42
, 1000000
, and so on. They are values of SmallInt
or BigInt
depending on its size, but for all practical purposes you can regard them as simply Int
. There are no negative integer literals: constructing a negative number (and thus integer) is a job of parser, not tokenizer. (This also applies to all other numeric literals.)
Integer literals can be prefixed by the radix indicators:
0b
for binary, where valid digits are 01
.0o
for octal, where valid digits are 01234567
.0x
for hexadecimal, where vaild digits are 0123456789ABCDEF
(case-insensitively).Additionally, generic literals 2"..."
, 8"..."
and 16"..."
can be used instead of these indicators. This syntax allows an arbitrary radix between 2 and 36 (where a digit Z
corresponds to 35), and either whitespaces or underscores are ignored for readability.
TODO enclosed digits in generic integer literal
3.14
, 314e-2
and so on. They are values of Decimal
. Basically any number containing a decimal point (.
) or exponent part (e123
, e+123
, E123
, E+123
etc.) is a decimal literal. A decimal point should be between two digits: .345
or 345.
don’t constitute a valid decimal literal (while 345e0
is valid). When the exponent part is present, a number before it is multiplied by a power of 10 raised to the given exponent.
There is a special digit for decimal literals: ?
. It is mostly like a digit 0
, but in addition specifies that the decimal literal containing it is inexact and the digit is insignificant. (Therefore there should be no significant digits after ?
.) Therefore 314????
is equivalent to 3.14????e4
or 3.14?e4
, but not 3.14e4
which is exact. Any decimal literal should include at least one significant digit (even though it is 0
) in order to distinguish it from Char
.
3.14f
, 314e-2f
and so on. In the other words, they are decimal literals followed by f
or F
, and values of the biggest Float*
type that is suitable in the context. For example, a: Float32 | Float64 = 3.14f
will use Float64
—as two types are possible and Float64
is bigger—but a = 3.14f
will use Float80
in IA-32 and x86-64. This is because retrieving (say) Float32
number by converting the decimal string directly is not same to retrieving Float64
and converting it to Float32
. (For paranoids: This can result a precision loss when rounding toward nearest even or infinity is in use.)
To support a precise floating point number, naru allows 0x
radix indicator in the floating point literals. In this format binary exponent part (p123
, corresponds to 2123) is used instead of decimal one. Therefore 3.14f
is approximately equivalent to 0x1.91eb851eb851fp+1f
in Float64
, and so on.
Floating point literals support ?
special digits, but they behave as like 0
digits and the resulting number is always inexact. This is an another reason for avoiding floating point numbers.
3.14j
, 3.14fj
and so on. In the other words, they are decimal or floating point literals followed by j
or J
(which is often used in electrical engineering to avoid a confusion with the current, i
). They are values of CDecimal
or CFloatNN
accoding to their base literal.
Underscores (_
) may appear within a numeric literal for readability; they are valid as long as there are no consecutive underscores, and they only appear between two digits, between a digit and a exponent separator (e
or p
) or between a digit and a decimal point (.
). For example, 0x1_2_3
and 12_3_._4_5?_??_e_15
are valid, but _34
(which is an identifier), 34_
, 0_x123
or 13e_+45
are invalid. Generic integer literals do not have this restriction as long as they are properly delimited by quotes.
There are no separate literals for Ratio
and Complex
with a real part, but the constant expression for them can be easily constructed with division and addition: 3/4
and 3+4j
, for example.
As mentioned earlier, the principal integer type in naru, Int
, has arbitrary precision. However there are several size-restricted (native) types on Int
which works just like integers in other many languages. These are available as a part of naru ext
module, since it is an integral part of naru native interface. They are also useful for tasks that require the fixed-size integers, such as low-level cryptological algorithms.
Arithmetics on native integer types can overflow without a notice:
import naru ext *
a = Int32 from(66000)
(a * a) println() --> 61032704 (= 66000^2 mod 2^32)
This is intentional. Those who use native types should understand the exact characterstics of them, so they should deal overflows by themselves. The behavior of native types on the corner case is thus undefined—it is possible that, for example, Int32 from(2) >> 32
returns 2
(as it does in IA-32 architecture, where a shift amount is ANDed with 31). It is even possible that native types do not use 2’s complement, unlike Int
. (For paranoids: Int
implicitly assumes 2’s complement representation, but in fact it is more like a 2-adic number restricted to the actual integers.)
One can request overflows to be explicitly notified (by throwing exceptions). This will also catch corner cases like Int32 from(2) >> 32
:
do
(a * a with :checked) println()
but Exception as e
#"#a * #a overflowed." println()
end
Of course the inline but
expression comes handy in this case:
(a * a with :checked but OverflowError -> 0) println()
Conversion between different native integer types is explicit. This prohibits all sort of bugs due to signed-unsigned conversion, for example.
a = Int32 from(1234567890)
a2 = Int64 from(a) -- 1234567890
a3 = Int16 from(a) -- 722 (= 1234567890 mod 2^16)
a4 = Int8 from(a) -- -46 (= 1234567890 mod 2^8 - 2^8)
a5 = Int8 from(a) with :checked -- error
a6 = Int16 from(a4) -- -46
a7 = Int8 from(a6) with :checked -- -46 (as -46 fits in Int8)
b = Int32 from(-987654321)
b2: Nat32 = b unsigned -- 3307312975
b3: Int32 = b2 signed -- -987654321
I have already introduced basic arithmetic operators on numbers, +
, -
, *
, /
, //
, %
, and **
. I will introduce remaining ones here.
Combined quotient-remainder operation. This operation is denoted /%
, and by the definition a /% b == (a // b, a % b)
. However it is more efficient than two //
and %
operations, and can be really handy in some cases:
-- mixed radix handling! yay!
nminutes, seconds = nseconds /% 60
nhours, minutes = nminutes /% 60
ndays, hours = nhours /% 24
years, ndaysinyear = ndays /% 365
weeks, days = ndaysinyear /% 7
Bitwise operators. These are similar to C-like languages:
~a
for bitwise NOT;a & b
for bitwise AND;a | b
for bitwise OR;a ^ b
for bitwise XOR;a << b
for bitwise left shift;a >> b
for bitwise right shift (zero-extending for unsigned native integer types, sign-extending for all other integer types).Of course they are only applicable for Int
s and subtypes. 2’s complement representation is assumed for all Int
s, so some identities like ~x == -x-1
are always true for Int
. Native integer types do not have to behave like this though.
Special bitwise operations. While they are not as necessary as bitwise operators above, integer types provide more specialized bitwise operations:
a bitlength
for the number of bits required for representing a
(e.g. 127 bitlength == 7
, 128 bitlength == 8
and 0 bitlength == 0
). This attribute is only valid for non-negative a
, and closely related to the base-2 logarithm of a
.a popcount
for the number of bits set (e.g. 42 popcount == 3
, 128 popcount == 1
and Nat64 from(-1) popcount == 64
). This attribute is only valid for non-negative a
.These operations are only available for fixed-size integer type due to their semantics:
a bitreversed
for reversing the bit order (e.g. Nat16 from(0x1234) bitreversed == Nat16 from(0x2c48)
). This attribute is valuable for Fast Fourier transform, for example.a leftrotate(b)
and a rightrotate(b)
for bitwise left and right rotation.TODO
TODO should decimals track the significant digits? then how about interval arithmetic which will collide with significance arithmetic? or exact arithmetic based on continued fraction?
naru provides two modules for mathematical functions, named naru math
(for real numbers) and naru math complex
(for complex numbers). The former are separated from the latter as complex-valued functions are not like real-valued functions: the logarithm of i, for instance, is not only π/2i but also -3π/2i, 5π/2i and so on. Whoever using complex-valued functions should be aware of this multi-valued nature.
The difference between these modules and Number
methods is that the (relative) simplicity to implement them: gcd
, for example, needs a bulk of code to speed it up (Hint: normal Euclidean algorithm is SLOW) but bitlength
is readily available in many modern architectures.
Constants. pi
(3.141592…) and e
(2.718281…) are constants for the ratio of a circumference of a circle to its radius and the base of natural logarithm respectively. TODO what type of those constants should be? the automatic rounding hurts the accuracy (should be less than 0.5ulp). it is possible that those are separate singleton subtypes of Real
(say Pi
and E
) and coerced whenever needed (Pi * Float64 -> Float64
, Pi * Float32 -> Float32
etc.), but it is a bit too much.
Arithmetic. Some arithmetic functions present in methods or easily implementable using them are replicated here as well, mostly for convenience and subtle differences:
sign(x)
is almost equivalent to x class from(x sign)
, so sign(-8)
is -1
, sign(3/2)
is 1/1
, sign(0.0f)
is 0.0f
and so on.pow(x, y)
is equivalent to x ** y
.pow(x, y, z)
is equivalent to x ** y % z
, or more efficiently (when y
is large), (x ** y) with modulo: z
.sqrt(x)
is equivalent to x ** (1/2)
, but more efficient than the naive square root.isqrt(x)
is equivalent to Int from(x ** (1/2))
, but more efficient than the naive integer square root. (Note that the accuracy of isqrt
is lower for real number argument x
because of the rounding problem.)hypot(x, y)
is equivalent to sqrt(x**2 + y**2)
.Trigonometric and Hyperbolic Functions. There are 24 possible combinations:
sin(x)
, cos(x)
, tan(x)
, csc(x)
, sec(x)
, cot(x)
.asin(x)
, acos(x)
, atan(x)
, acsc(x)
, asec(x)
, acot(x)
. The range of these functions is chosen such that they make proper functions and contain zero at least at the end point.sinh(x)
, cosh(x)
, tanh(x)
, csch(x)
, sech(x)
, coth(x)
.asinh(x)
, acosh(x)
, atanh(x)
, acsch(x)
, asech(x)
, acoth(x)
.For trigonometric functions, the second argument specifies how x
is scaled: :radians
(default) and :degrees
are available. For inverse trigonometric functions it specifies how its result is scaled.
naru also provides a much-used atan2(y, x)
as a sign-correcting version of atan(y / x)
, which result ranges from -π (exclusive except for y == -0.0
) to +π (inclusive).
Exponential and Logarithmic Functions. Since these functions are crucial for many numerical calculations, many variations are available:
exp(x)
and log(x)
calculate a base-e
exponential and natural logarithm of x
.log(x, b)
calculates a base-b
logarithm of x
, and is equivalent to log(x) / log(b)
.log2(x)
and log10(x)
are equivalent to log(x, 2)
and log(x, 10)
respectively, but more efficient.expm1(x)
and log1p(x)
are equivalent to exp(x) - 1
and log(x + 1)
respectively, but more accurate especially when x
is close to zero. This is due to the common choice of algorithms for exp
and log
.Non-elementary Functions. TODO description
erf(x)
returns the error function of x
, that is used to calculate the cumulative distribution function Φ of standard normal distribution: Φ(x) = (1 - erf(-x / sqrt(2))) / 2
.erfc(x)
returns the complementary error function of x
, and is equivalent to 1 - erf(x)
.gamma(x)
returns the gamma function of x
, that is a generalization of factorial function to the real domain (see also factorial
function below).loggamma(x)
is equivalent to log(gamma(x))
, but won’t overflow for a large x
. This bound is actually quite small: gamma(171)
is enough to overflow a standard Float64
.Other Mathematical Functions. TODO description
factorial(x)
returns the product of 1 to an integral x
. This is Int from(gamma(x + 1))
but won’t overflow for a large x
.gcd(args...)
returns a greatest common divisor (GCD) of given integral or rational numbers. The result is always non-negative (i.e. gcd(-8,6) == 2
), and zeroes in the arguments are ignored unless all arguments are zeroes (i.e. gcd(-8, 6, 0) == gcd(2, 0) == 2
but gcd(0, 0) == 0
).lcm(args...)
returns a least common multiple (LCM) of given integral or rational numbers.gcdex(args...)
returns a tuple of GCD g
and a vector c
of coefficients that satisfies the Bézout’s identity: args[0] * c[0] + ... + args[n-1] * c[n-1] == g
. Coefficients are not unique, so some adjustments can be required to get the desired result.Floating Point Specials. TODO description
minnan(args...)
and maxnum(args...)
return the minimum and maximum of given numbers, but prefers NaNs over normal numbers.minnum(args...)
and maxnum(args...)
return the minimum and maximum of given numbers, but prefers normal numbers over NaNs.minnummag(args...)
and maxnummag(args...)
return the minimum and maximum of absolute values of given numbers, but prefers normal numbers over NaNs.Complex-valued Functions. abs(z)
(not actually in naru math complex
module) returns sqrt(z imag ** 2 + z real ** 2)
as expected. phase(z)
returns a principal argument of z
, which ranges from -π (exclusive except for arg(-1.0-0.0j)
) and +π (inclusive). Other functions follow this fixed range of phase(z)
to determine its principal value.
Some arithmetic functions (i.e. sign
, pow
and sqrt
), all trigonometric and hyperbolic functions, all exponential and logarithmic functions except for atan2
, and all non-elementary functions are available in naru math complex
module, but with some changes:
sign(z)
returns a complex number which is a real-number multiple of given number and its absolute value is 1, except for sign(0j) == 0j
. In the other words, sign
returns a direction of given complex number. This is equivalent to z / abs(z)
except for z = 0j
.log(z)
is equivalent to log(abs(z)) + phase(z) * 1j
.asin(z)
is equivalent to log(z*1j + sqrt(1 - z**2)) * -1j
.acos(z)
is equivalent to pi/2 - asin(z)
.atan(z)
is equivalent to (log(1 + z*1j) - log(1 - z*1j)) / 2j
.The following is the full hierarchy of collections in naru:
Enumerable{+T}
|- Iterator{+T}
|- Option{+T} (also known as T?)
|- Range{+T}
|- Sorted{T}
| +- ...
+- Immutable{+T}
|- PureSList{+T}
|- PureList{+T}
|- PureVector{+T}
| |- Symbol (T=Char)
| |- PureBytes (T=Int)
| +- PureSortedVector{+T} <- Sorted{T}
|- PureMultiset{+T}
| |- PureSet{+T}
| | +- PureSortedSet{+T} <- Sorted{T}
| +- PureSortedMultiset{+T} <- Sorted{T}
|- PureMultimap{+Key, +Value} (T=(Key,Value))
| |- PureMap{+Key, +Value}
| | +- PureSortedMap{+Key, +Value} <- Sorted{(Key,Value)}
| +- PureSortedMultimap{+Key, +Value} <- Sorted{(Key,Value)}
+- Mutable{T}
|- SList{T} <- PureSList{T}
|- List{T} <- PureList{T}
|- Vector{T} <- PureVector{T}
| |- String <- Symbol (T=Char)
| |- Bytes <- PureBytes (T=Int)
| +- SortedVector{T} <- PureSortedVector{T}
|- Multiset{T} <- PureMultiset{T}
| |- Set{T} <- PureSet{T}
| | +- SortedSet{T} <- PureSortedSet{T}
| +- SortedMultiset{T} <- PureSortedMultiset{T}
+- Multimap{+Key, Value} <- PureMultimap{Key, Value} (T=(Key,Value))
|- Map{+Key, Value} <- PureMap{Key, Value}
| +- SortedMap{+Key, Value} <- PureSortedMap{Key, Value}
+- SortedMultimap{+Key, Value} <- PureSortedMultimap{Key, Value}
Collections are basically classified by key, value types and restrictions and performance guarantees on them. There are two broad classification methods for collections: mutable vs. immutable and sorted vs. unsorted.
Collections in naru distinguish the immutable ones and mutable ones. Immutable collections are sometimes called “functional data structure”, since it is apt for pure functional languages. Since they are immutable there are some opportunity to optimize: for example, collections with many parts shared will benefit since they will have much smaller memory footprint.
Every built-in collection syntax you have seen before is mutable; you can just prepend :
to get the immutable version. (Think of :=
which also conveys the immutability.)
vec1 = [3, 4, 5]; vec2 = :[3, 4, 5] -- Vector{Int}, PureVector{Int}
set1 = {3, 4, 5}; set2 = :{3, 4, 5} -- Set{Int}, PureSet{Int}
map1 = {3 => 4, 5 => 6}; map2 = :{3 => 4, 5 => 6} -- Map{Int,Int}, PureMap{Int,Int}
str1 = "blahblah"; str2 = :"blahblah" -- String, Symbol
In the last line you can see that string was in fact mutable. The immutable version is called symbol; if the symbol contains only letters you can omit quotes like :blahblah
. When is the symbol useful? The most important case is a key for the map, which has to be immutable. For example:
numbers = {:one => 1, :two => 2, :"fourty-two" => 42}
numbers[:"fourty-two"] println() --> 42
numbers[:"one hundred"] = 100
“Hey, you have used a bare string as a key before. You mean that was incorrect?!” Of course no. Since the string has an immutable counterpart, a map with string keys will silently convert to/from symbols. For example:
numbers = {"one" => 1, "two" => 2}
key = "one hundred"
numbers[key] = 100
key[0..2] = "two"
key println() --> two hundred
numbers println() --> {"one" => 1, "two" => 2, "one hundred" => 100}
In this case only the copy of key
(which is a symbol) is stored in the map, and every access to the key will return a string copy of it. (For paranoids: This may incur a big cost when naively implemented, but a copy-on-write string/symbol implementation will eliminate most costs. And the compiler may recognize the string which does not change at all and treat them as like a symbol, too.)
For the pure maps, you know what you are doing so you have to use immutable types only. But since it is quite cumbersome to write all colons you just have to write only one outermost colon in this case. Therefore the following three lines are equivalent, and both has the type PureMap{Symbol,List{Int}}
.
:{"one" => [2, 3], "four" => [5], "six" => [7, 8, 9, 10]}
:{:one => [2, 3], :four => [5], :six => [7, 8, 9, 10]}
:{one: [2, 3], four: [5], six: [7, 8, 9, 10]} -- a shortcut for symbol keys
Anyway, in most cases there are a pair of mutable and immutable collections, and you can use freeze()
and thaw()
methods to get an immutable/mutable version of the collection. For standard collections those conversion methods are designed to be efficient, especially when the conversion is temporary (i.e. mutable version is actually not altered). Mutable
and Immutable
class attributes can be used to get a type of counterparts.
Enumerable
interfaceEvery collection provides the common interface, abstracted by Enumerable
, and it is used to implement the loop. The actual iteration is implemented by Iterator
interface, which is also a subclass of Enumerable
; it basically returns elements in the collection one-by-one until the collection is ran out (if the collection is finite). Enumerable
collection should provide an iterator()
method which returns a fresh value of Iterator
.
Due to this abstraction, the following loop…
for el <- collection
-- do something with el
end
…is equivalent to the following desugared form (Option{T}
type will be introduced shortly):
with _iterator <- collection iterator() -- internal variable
while true
_current: Option{T} := _iterator next() -- internal variable
if not _current holds then break end
el = _current value
-- do something with el
end
end
The iterator is functional. This means that you don’t have a headache when iterating the collection and simultaneously modifying it. For example:
arr = [1, 2, 3]
for x <- arr
x println()
arr append(x)
end
arr println()
This code is well-defined in naru and prints 1
, 2
, 3
and [1, 2, 3, 1, 2, 3]
respectively. To achieve this behavior, the collection is implicitly frozen when the iteration starts as the iteration on immutable collections is trivial. Since the conversion between immutable and mutable collections generally incurs a small cost (and the compiler often converts a mutable collection into the immutable version when possible) this does not penalizes the normal loops.
There are a series of sorted collections that mandate the enumeration order to be sorted. Therefore an element of sorted collections should be ordered (i.e. a value of Ordered
abstract class). Unlike the mutability, the sortedness is not strictly checked by the compiler. This means the following code is valid:
a: Vector{Int} = [1, 5, 3]
b: SortedVector{Int} = SortedVector([2, 4])
a2: SortedVector{Int} = a sorted() -- returns a sorted collection
c: Vector{Int} = b ~ a2 --> [2, 4, 1, 3, 5]
While there is a good merging algorithm around (and in fact available as SortedVector merge
), in general just concatenating two sorted vector does not yield a sorted vector automatically. Of course concatenating an empty vector does yield a value of SortedVector{Int}
class, but its type is still Vector{Int}
.
Unsorted collections are further divided by two classes, ordered and unordered, depending on the stability of the iteration order. For example, it is possible that two unordered collections with same elements may give their elements in the different order. Of course once the order is known, however, the order should be preserved until the collection is altered. (For paranoids: This means every collection which optimizes itself during the access should optimize itself before any iteration; an example of this restriction would be a union-find data structure, used in PartitionedVector
.) Vector
is an example of unsorted but ordered collection; Set
does not have to be sorted nor ordered.
The main reason of having the sortedness is that you normally don’t care whether the collection is sorted or not, but knowing that one is sorted often speeds things up and you will need it for certain circumstances anyway. Also there is some freedom that has to be sacrificed for the sortedness: for example, Map
can use a hash table but SortedMap
cannot.
The following is a full list of standard collections in naru. Only the name of mutable version is shown, as the name of immutable version almost always follows the same pattern (Pure
plus the name of mutable one).
Name | Sorted? | Main usage | Underlying data structure |
---|---|---|---|
SList |
No | Sequential access without almost no random access; supports O(1) slicing without copy | Singly-linked list |
List |
No | Bidirectional access without almost no random access; supports O(1) slicing without copy | Doubly-linked list |
Vector |
No | General random access; insertion is costly except for appending | Flexible array or deque |
String |
No | General string; rich string operations with Unicode support | Flexible array (or rope?) |
Bytes |
No | Byte sequence; some ASCII-oriented string operations | Flexible array or deque |
SortedVector |
Yes | General random access; O(log n) search for elements; random update and insertion is costly except for appending | Flexible array or deque |
Set |
No | almost O(1) search for elements; removes duplicates automatically; supports fast set operations | Hash table with open addressing; possibly perfect hashing for PureSet (if compiler supports) |
SortedSet |
Yes | O(log n) search for elements, lower and upper bounds; removes duplicates automatically; supports fast set operations | Self-balancing binary tree |
Multiset |
No | almost O(1) search for elements; keeps duplicate elements; supports fast set operations | Hash table with separate chaining |
SortedMultiset |
Yes | O(log n) search for elements, lower and upper bounds; keeps duplicate elements; supports fast fast set operations | Self-balancing binary tree |
Map |
No | almost O(1) search for keys; removes duplicate keys automatically | Hash table with open addressing; possibly perfect hashing for PureMap (if compiler supports) |
SortedMap |
Yes | O(log n) search for keys, lower and upper bounds; removes duplicate keys automatically | Self-balancing binary tree |
Multimap |
No | almost O(1) search for keys; keeps duplicate key-value pairs | Hash table with separate chaining |
SortedMultimap |
Yes | O(log n) search for keys, lower and upper bounds; keeps duplicate key-value pairs | Self-balancing binary tree |
There are several advanced collections offered by naru collection
module:
Name | Sorted? | Main usage | Underlying data structure |
---|---|---|---|
OrderedSet |
No | almost O(1) search for elements; removes duplicates automatically; preserves the insertion order | Hash table with open addressing, overlaid with doubly-linked list |
OrderedMultiset |
No | almost O(1) search for elements; keeps duplicate elements; preserves the insertion order | Hash table with separate chaining, overlaid with doubly-linked list |
OrderedMap |
No | almost O(1) search for keys; removes duplicate keys automatically; preserves the insertion order | Hash table with open addressing, overlaid with doubly-linked list |
OrderedMultimap |
No | almost O(1) search for keys; keeps duplicate key-value pairs; preserves the insertion order | Hash table with separate chaining, overlaid with doubly-linked list |
Bimap |
No | almost O(1) search for both keys and values | Two hash tables |
PriorityQueue |
No | O(1) access and deletion for minimal or maximal element; O(log n) insertion; fast merge operation | Fibonacci heap |
PartitionedVector |
No | O(1) access for each partitions; almost O(1) partition merges | Disjoint-set forest |
PartitionedMap |
No | almost O(1) access for each partitions; almost O(1) partition merges | Disjoint-set forest plus a hash table |
TODO
As mentioned before, there are eight collection syntaxes available:
[...]
and :[...]
, for Vector
and PureVector
respectively,{...}
and :{...}
with no :
or =>
tokens within the brace, for Set
and PureSet
respectively,{...}
and :{...}
with :
or =>
tokens within the brace, for Map
and PureMap
respectively, and"..."
and :"..."
, for String
and Symbol
respectively.The last case is quite different from other syntaxes, as they are in fact single literals. The detailed behavior of them will be presented later; also note that "..."
syntax can be extended by prepending a user-defined prefix.
There are two flavors of syntaxes other than strings and symbols: enumeration and comprehension.
Enumeration syntax. This is a collection with every elements enumerated, like [1, 2, 3]
, {1, 2, 3}
and {1 => 2, 3 => 4}
. The trailing comma is ignored (i.e. [1, 2,] == [1, 2]
). An empty collection is represented as []
, {/}
and {}
respectively: {/}
is for a set and {}
is for a map. (Note that {/ }
, { /}
or { / }
are all invalid, as {/}
is a single token. { }
is equivalent to {}
however.) When the opening token prepended by a single token :
the syntax denotes an immutable collection instead: :[1, 2, 3]
, :{1, 2, 3}
and :{1 => 2, 3 => 4}
. (Therefore : {/}
is valid, although looks quite confusing.)
A flattening prefix \
can be used within the enumeration, just like function arguments:
a = [3, 4, 5]
b = [1, 2, \a, 6, 7] -- b is now [1, 2, 3, 4, 5, 6, 7]
a = {3, 4, 5}
b = [1, 2, \a, 6, 7] -- b contains seven elements from 1 to 7,
-- but 3, 4, 5 will be arbitrarily ordered.
a = {"foo" => "bar"}
b = {"a" => "x", \a} -- b is now {"a" => "x", "foo" => "bar"}
a = {3 => 4}
b = [(1,2), \a, (5,6)] -- b is now [(1,2), (3,4), (5,6)]
As shown in examples, flattening a unsorted collection into an ordered collection highly depends on the iteration order which is arbitrary by definition. It also shows that two collections should be of the compatible type during the flattening; the last example is valid, as Map{Int,Int}
is a subtype of Enumerable{(Int,Int)}
.
For collections other than vectors, sets and maps, the enumeration syntax is simply a collection class called with Enumerable
interface:
a = SortedMultiset([5, 4, 3, 2, 1, 2, 3, 4])
a println() -- SortedMultiset([1, 2, 2, 3, 3, 4, 4, 5])
Note that the iteration of an unordered collection is arbitrary, so you should be careful when initializing an unordered collection which requires an ordered collection nevertheless.
Comprehension syntax. This is a collection constructed from other collections:
a = [1, 2, 3]
b = [x**2 for x <- a] -- b is [1, 4, 9]
c = [5-x for x <- b if x<=5] -- c is [4, 1]
d = :{x + y for x <- a, y <- a} -- d is :{2, 3, 4, 5, 6}
e = {x => x**3 for x <- a} -- e is {1 => 1, 2 => 8, 3 => 27}
For mutable collections, a vector comprehension [f(x1,...,xn) for x1 <- L1 if g1(x1) ... for xn <- Ln if gn(x1,...,xn)]
is equivalent to this code:
L = []
for x1 <- L1
if g1(x1)
...
for xn <- Ln
if gn(x1,...,xn)
L append(f(x1,...,xn))
end
end
...
end
end
A set comprehension and map comprehension is similarly defined, and for x1 <- L1, x2 <- L2
is equivalent to for x1 <- L1 for x2 <- L2
. As a consequence, variables from x1
to xn
is only visible within the comprehension, and xk
is only visible after for xk <- ...
appears (except for f(x1,...,xn)
, of course).
For other collections without any built-in syntax, one can use a simple coroutine syntax which is incidentally similar to the comprehension:
a = {1 => 2, 3 => 4}
b = Bimap((y, x) for x, y <- a)
-- ...which is equivalent to Bimap( ((y, x) for x, y <- a) ).
TODO flattening comprehension: [\(0..k) for k <- 0..3] == [0, 0, 1, 0, 1, 2, 0, 1, 2, 3]
for example.
While naru has ()
value, it is mostly used for a return value from functions with no return
statement. (This is equivalent to unit in many functional languages, because there is only one possible value for this type.) You often want to store something only if some condition meets; the variable will store nothing if not. But you cannot use ()
for this purpose in general, as ()
is fundamentally different from other values and often explodes in front of your face.
The better way is to explicitly declare that you are aware of such cases. Thus naru has Option
class (which can be written as T?
in the type specification):
lexicon = {
"hello" => "Hallo",
"dog" => "Hund",
"wing" => "Flügel",
}
lexicon get('dog') println() --> Just("Hund")
lexicon get('cat') println() --> None
Map get()
method returns an Option{T}
value, which can be a Just{T}
value with a desired value of type T
or a None
value without any associated value. In the reverse direction you can make a new Option
value by just(value)
or none
, without no type parameters (thanks to the type inference). One can check if the value is None
or not by using hold
attribute:
word2 = lexicon get(word)
if word2 hold
#"#word -> #(word2 value)" println()
else
"Not found." println()
end
But it is somewhat verbose. Now the fundamental question: what on earth is Option
discussed in the collection section? Because it is a trivial collection with zero or one element. In fact you can loop with an Option
value:
for w <- word2
#"#word -> #w" println()
break
else
"Not found." println()
end
Here else
part in for
block gets executed when the loop terminates normally, not abruptly by break
or return
statement. But this also look verbose, so naru has an another shortcut:
if w <- word2
#"#word -> #w" println()
else
"Not found." println()
end
It only works for Option
class and its subclasses, by the way. For other collections you can just use for
loop.
(For paranoids: none
is of None{Bottom}
class, where Bottom
is a type with no values and a subtype of all other types. Since Option
is covariant—it is immutable!—, none
is a value of None{T}
for any type T
. Great.)
TODO also syntax
TODO also syntax
TODO
Well, this title is misleading. While we have used the words “type” and “class” with no doubt, it’s time to confess: they are too confusing as is. In naru we have three good candidates that fit in those words:
class
attribute returns.2 and 3 are different things because we may put 42
to the function with type (Rational)->()
. Guy L. Steele has written about this problem at the length, and we will go with the simplest solution (with some caveats): those that can be easily encountered in runtime are named class, and those that cannot are named type. Therefore our terminology is as follows: metaclass for 1, type for 2 and class for 3.
In the other words, we have talked about all those classy things, but not types. So let’s talk about types. The first thing: naru is not a dynamically typed language, and is not a statically typed language either. naru lies somewhere between them. When you write the following:
a = 42
It actually means the following:
a: * = 42
…where *
is a special type meaning it is dynamic. It may result in runtime error at any time. However if you write the following:
a: String = 42
Then the compiler knows that 42
(of Int
class) cannot be of type String
(certainly!) and lets us know the error. You may mix and match these types as well:
a: * = 42
b: Int = 54
(a / b) println()
Since a
is dynamic, a / b
is also dynamic (it may result in Real
or Rational
depending on a
’s class). TODO we are not certain about a mix of dynamic and static types; we may adopt some other works here
We will see other categories of types shortly, but look at the important category first: function types. These are denoted as (T1,T2,...)->U
where Tn
are argument types and U
is a return type.
zero: ()->Int := fun () -> 0
-- named arguments are a part of function types.
double: (Int)->Int := fun (x:Int) -> x * 2
triple: (x:Int)->Int := fun (x:Int) -> x * 3
double(42) --> 84
triple(42) --> 126
triple(x:42) --> 126
-- argument names themselves do not affect the type system.
pow: (Int,Int)->Int := \
fun (x:Int,y:Int) -> base == 0 and exponent == 0 then 1 else x ** y
pow1: (x:Int, y:Int)->Int := pow
pow2: (y:Int, x:Int)->Int := pow1
pow1(x:3, y:4) --> 81
pow2(x:3, y:4) --> 64
As noted, argument names are merely an annotation that is only used when you call the function. (In PL jargon: they do not affect the subtyping, i.e. (x:A,y:B)->C <: (y:A,x:B)->C
and (y:A,x:B)->C <: (x:A,y:B)->C
.) However you will be hardly affected by this subtle rule, since the normal function declaration like this:
add(x:Int, y:Int) := fun
...
end
…will implicitly set the correct annotation. In fact this is equivalent to the following code:
add: (x:Int, y:Int)->_ := fun (x:Int, y:Int):_
...
end
Also note that the function arguments can be followed by the result type. Except that ->
is used in place of :
, this is also applicable for the uniform declaration, like add(x:Int, y:Int)->Int := ...
. (The placeholder _
will be explained later in this section.)
The type specification in naru is a mini language for its own:
Int
, String
and so on. These types normally reflect classes available in the current scope, but you can also use the types from other modules:
import somemodule
some: somemodule SomeClass = somemodule foo()
import anymodule * -- if this imports AnyClass...
any: AnyClass = bar() -- then this is also valid.
Every top-level types in the type specification should be declared by the binding (:=
). TODO is this restriction necessary? can’t we have the runtime type checker?
In the other words, types with static parameters. These take the same form with the definition: Vector{Int}
, for example.
Other types can be prefixed by the names of static parameters to denote the parametric types: for example, {T,U} ((T)->U,T)->U
. These static parameters are local to the following type specification: so ({T} (T)->Int, {T} (T)->String)->()
is a correct type and can take values of (Char)->Int
and (String)->String
as arguments.
For parametric classes like Vector
you can shorten {T} Vector{T}
(i.e. all values of that type regardless of static parameters) to simply Vector
.
You can access class attributes that are defined by binding (:=
) from the type specification: the prime example would be Mutable
and Immutable
type attributes in Collection
class, like Vector{Int} Immutable
(which will result in PureVector{Int}
; see the collection list below).
()
, (Int, Real)
, (Int, Real, String)
and so on.
One or more types within a tuple type can be prefixed by \
, which requires corresponding types are a tuple type with fixed size. (For example, if T
is (Int,Real)
then (\T,String,\T)
is equivalent to (Int,Real,String,Int,Real)
.) Tuple types can also have an ellipsis (...
) at the end of the list, meaning the type accepts given number of values and zero or more other types. Therefore it is in fact a syntactic sugar to \_
.
TODO exact encoding is not decided
()->()
, (Int,Int)->Int
, (fmt:String,args:Value...)->()
and so on. The generic syntax is explained already; there can be several kinds of argument specifications inside it:
String
).fmt:String
)....
(e.g. args:Value...
), where its actual type is a Vector
of specified type (e.g. Vector{Value}
in this example). This argument should be at the end of argument list if present, except for keyword-only arguments.?
(e.g. String?
), where its actual type is an Option
of specified type (e.g. Option{String}
in this example).As stated earlier the name of arguments if given does not affect the type system. As an exception, you cannot ignore the name of keyword-only arguments: you have to keep them as is if they are non-optional.
Int|Real
, {T} (Reader{T} & Writer{T})
and so on. These represent the disjunction and conjunction of given types respectively, and syntactically bound later than static parameters and attributes.
Denoted as *
, it enables the full dynamic type checking. It can appear in any position in the type specification, e.g. (*,*,*)->*
(which would be the default type of fun (x,y,z) -> x
). (For paranoids: It may be possible that fun (x,y,z) -> x
would have a type of {T} (T,*,*)->T
, but we don’t do so in order to simplify the implementation.)
Dynamic type is a supertype of all static types, and thus the conversion from static type to dynamic type is automatic. The reversal needs the type casting such as x as SomeType
. We will discuss about how to mix static and dynamic types later.
Denoted as _
, it is not a really type but specifies that the compiler has to fill it with the appropriate type. For example, x: _ := 42
will be almost equivalent to x: Int := 42
. (In reality, it can be filled with other implementation-dependent subtypes of Int
, e.g. SmallInt
.) Like the dynamic type it can appear in any position in the type specification, e.g. Vector{_}
or _{Int}
, and placed before ...
it will infer the types of the arbitrary number of function arguments, e.g. (Int,_...)->()
.
It is the default type for the return type in the function, and in fact the only possible return type for the inline function syntax—for example, fun (x,y) -> x+y
is equivalent to fun (x,y)->_; return x+y; end
(which type is (x:*,y:*)->*
). It also arises from the use of placeholder in the tuple unpacking, and also from the assignment or binding without a type when use static
is enabled.
self
typeDenoted as self
, it denotes the type of enclosing class. It is invalid outside the class or in the class method.
It is quite limited compared to the normal expression syntax, which is intentional and allows an efficient compile-time implementation.
(For paranoids: Technically speaking, naru’s type system is (for now) a rank-1, predicative Hindley-Milner type system extended with nominal subtyping. The type inference on this type system should be decidable, though it hasn’t been proved yet. We plan to extend the type system to the higher-order and impredicative subset of System F, which restrictions allow the decidable type inference with some annotations.)
TODO describe syntactic sugars like T?
for type expression
While the static typing in naru is optional, naru does force one to use the static typing in some cases. The major cause is the multiple dispatch, since we should know the arguments’ types in order to determine what function is to be called:
#+#(a: Int, b: Real) := fun -> Real from(a) + b
#+#(a: Real, b: Int) := fun -> a + Real from(b)
If some expression is typed statically, any access to attributes and methods of the expression is checked in the compile time. So if we wrote the first function body as fun -> a addToReal(b)
, it will be caught earlier given that addtoReal
method does not exist in Int
.
Static to Dynamic. While this behavior detects many bugs due to the typo, one may find it inconvenient. There are several ways to convert the static type into the dynamic type (i.e. force the compiler to forget the type information): the easy way is to assign the statically-typed value to the variable, since the variable is typed dynamically by default.
#+#(a: Int, b: Real) := fun
a0 = a -- a0 is typed as *
return a addToReal(b) -- this will raise the runtime error
end
The variation of this way is to use an explicit type casting via as
expression. naru will treat given expression as given type, and if needed will add a check for ensuring the assertion.
#+#(a: Int, b: Real) := fun -> (a as *) addToReal(b)
The last way is to use a function which arguments are dynamically typed. This allows the dynamically-typed function to accept the statically-typed values without any modification, with an appropriate runtime check.
addHelper(a, b) := fun -> a addToReal(b)
#+#(a: Int, b: Real) := fun -> addHelper(a, b)
Personally, I am against the first and second way because it will prevent the early detection of bugs. The very reason for naru to allow mixing static and dynamic types is to help the gradual transition: one may keep some functions static and some functions dynamic, and will only receive the runtime errors from the latter only. If you want the function to be static then it is better keeping it throughout the function.
Dynamic to Static. In the reverse direction, one may convert the dynamic type into the static type for better safety. The first way is to use the explicit type casting: if the cast fails the CastError
exception will be thrown.
double(x) := fun
do
return (x as Int) * 2
but CastError
do
return (x as String) ~~ 2
but CastError
raise ValueError("x is not an integer nor a string")
end
end
end
Note that one can use the Value isa
method instead of the nested do
~but
blocks:
double(x) := fun
if x isa(Int) then
return (x as Int) * 2
elseif x isa(String) then
return (x as String) ~~ 2
else
raise ValueError("x is not an integer nor a string")
end
end
Of course one still have to cast x
in order to notify the compiler; the compiler does not recognize the Value isa
method at all! Since this is very error-prone (what if one misses x as SomeType
in one case?), naru provides a shortcut:
double(x) := fun
given x
case Int as x then -- overrides the prior binding for x
return x * 2
case String as y then -- in this case x is still dynamically typed
return y ~~ 2
case * then
-- this case will catch all other types, since any type can be
-- converted to the dynamic type.
raise ValueError("x is not an integer nor a string")
else
-- this case is impossible, and in fact, can be omitted.
end
end
This given
-case
construct is generally used for pattern matching, but can also be used for general type matching as shown here. Detailed rules for pattern and type matching are given in the following sections.
The other way to convert the dynamic type to the static type is using the statically-typed function, possibly with the multiple dispatch. This is possible because calling the statically-typed function with a mismatching dynamic type is not a compile-time error but a runtime error. In fact, this is how as
operator is internally implemented: x as SomeType
is equivalent to (fun (_arg:SomeType) -> _arg)(x)
.
Forcing the static typing. We have already seen that the type of variables default to the dynamic type(*
). One may use use static
and use dynamic
in order to override this behavior:
-- the type checker will catch "p length" to be invalid.
do
use static
p = 4
q = "Hello"
"This line should not be printed" println()
(p length + q length) println()
end
-- before running the program the type checker does not try to
-- check "p length" to be invalid or not.
do
use dynamic
p = 4
q = "Hello"
"This line will be printed though" println()
(p length + q length) println()
end
Note that if we wrote the second-to-last line as (4 length + "Hello" length) println()
instead, then it will cause the type error in compile time, despite use dynamic
is enabled. This is because 4
itself always has a type of Int
(or its subtype); use static
and use dynamic
will make a difference only when the assignment or binding is used.
use static
also does not affect the argument types, which always default to the dynamic type. Therefore one should at least annotate the argument types when using static typing as a default. (For paranoids: While it is not impossible to infer the argument types from the function body, such types are very limited. For example, imagine the function succ(x:_)->_ := fun -> x + 1
; given #+#
is open to extension, what argument types should be inferred here? The solution is provided in the next section.)
The type inference in naru occurs when one uses a placeholder type _
:
x: Int := 42
y: _ := x + x -- the type of y is Int
z: _ := x ** y -- the type of z is Real (since y can be negative)
This type is also implicitly assumed for the following cases:
use static
is in active, the typeless assignment a = ...
is equivalent to a: _ = ...
. (Same for binding.)X(...) := ...
without the return type is equivalent to X(...)->_ := ...
.fun (...); ...; end
and fun (...) -> ...
without the return type is equivalent to fun (...):_; ...; end
and fun (...):_ -> ...
, respectively.type
attribute, which is a compile-time equivalent to class
attribute, is also internally inferred as like placeholders.This placeholder type has an important role for type-defining uniform declarations like X(a:Int) := class ... end
. In this case, the resulting return type should match with the type defined in this very declaration (therefore X(a:Int)->Y := class ... end
is rejected) and the inferred type is handy.
The type inference in naru is based on Hindley-Milner algorithm, which constructs a list of constraints in order to determine the actual type for placeholders. This algorithm always terminates (“decidable”), although it may take lots of time to finish the algorithm for pathological code. The compiler supports an option in order to limit the degree of type inference.
TODO boundary of type inference (maybe a module boundary) and type inference for open types
The naru type system primarily concerns with named types (i.e. nominal type system), but it is often convenient to use structurally defined types. For example the binary tree is either a leaf or a branch, where the branch contains two smaller trees and one value. It would be implemented in naru as follows:
Tree{+T} := trait
size()->Int := _
end
Leaf{+T}() := class <- Tree{T}
size() := fun -> 0
end
Branch{+T}(left:Tree{T}, value:T, right:Tree{T}) := class <- Tree{T}
size() := fun -> left size() + right size() + 1
end
Readers familiar to data structures and algorithms could immediately figure out how the insertion or deletion is implemented with this structure. Unfortunately such operations become very complex to implement correctly. For example, even the tree rotation (red-black trees rely on this operation extensively) becomes cumbersome:
rotateRight{T}(node:Tree{T}) := fun
use static
if node isa(Leaf) then raise ValueError("the top node is not a branch") end
top = node as Tree{T}
if top left isa(Leaf) then raise ValueError("the left node is not a branch") end
left = top left as Tree{T}
return Branch(left left, left value, Branch(left right, top value, top right))
end
rotateLeft{T}(node:Tree{T}) := fun
use static
if node isa(Leaf) then raise ValueError("the top node is not a branch") end
top = node as Tree{T}
if top right isa(Leaf) then raise ValueError("the right node is not a branch") end
right = top right as Tree{T}
return Branch(Branch(top left, top value, right left), right value, right right)
end
This code uses too much code to check if given tree is of a particular form and to narrow the compile-time type of the tree accordingly. The primary goal of pattern matching is to specify this using a structural way and thus to remove such boilerplate codes:
rotateRight{T}(node:Tree{T}) := fun
given node
case Branch(Branch(child1, value1, child2), value2, child2) then
return Branch(child1, value1, Branch(child2, value2, child2))
else
raise ValueError("cannot perform the tree rotation on this tree")
end
end
rotateLeft{T}(node:Tree{T}) := fun
given node
case Branch(child1, value1, Branch(child2, value2, child2))
-- "then" after the case speficiation is optional.
return Branch(Branch(child1, value1, child2), value2, child2)
else
raise ValueError("cannot perform the tree rotation on this tree")
end
end
It is much easier to spot what’s going on, and in particular it is now clear that the tree rotation does not affect the in-order traversal. Locally defined variables childN
and valueN
are only available within the corresponding case
block.
case
-ed Classes. naru issues an error for missing case
clauses against the statically given
type: if we omitted the else
branch in the earlier code, we will miss the case that node
is just a Leaf
(for instance). For the same reason, the following code raises a compile-time error:
TreeReduction{T,U} := trait
reduce(node:Tree{T})->U := _
end
TreeSum() := class <- TreeReduction{Int,Int}
reduce(leaf:Leaf{Int}) := fun -> 0
reduce(branch:Branch{Int}) := fun ->
self reduce(branch left) + branch value + self reduce(branch right)
end
It is not immediately obvious what’s wrong with this code, but the problem becomes clear if we add a subclass Trident{T}
(maybe for 2,3-tree) to Tree{T}
: TreeSum reduce
does not have an implementation for Trident
! Therefore we need some way to say that Tree{T}
is not open for direct subclassing, which is achieved by case
-ed classes.
Tree{+T} := trait
size()->Int := _
case Leaf{+T}() := class <- Tree{T}
size() := fun -> 0
end
case Branch{+T}(left:Tree{T}, value:T, right:Tree{T}) := class <- Tree{T}
size() := fun -> left size() + right size() + 1
end
end
case
specifier in the trait or class definition is similar to global
in such that the identifier is declared outside of the current scope, but also signifies the enumerated case
s are all possible direct subtypes of this trait or class. Therefore aforementioned Trident{T}
subclass is no longer possible with this definition. It is also possible to nest case
-ed traits or classes.
case
Specification for Clauses. The case
specification is a superset of the type specification, which means that the trait or ordinary types can also be used for case
s. It is even possible to use the dynamic type *
, which will catch every uncaught types. (In this regard the placeholder _
is exactly same to the dynamic type in the case
specification.)
The case
specification extends the type specification by adding the following expressions:
Branch(l,v,r)
, Leaf()
and so on. The empty constructor argument list is only used for verifying the given class has no constructor arguments.
Deconstructed names are available only within the following case
block. Deconstructed names are very similar to function arguments: a named argument syntax (Branch(left:l, right:r, value:v)
) can be used to reorder the argument order, and a placeholder (Branch(_,v,_)
) can be used to ignore some irrevalent arguments.
Class deconstructions can be nested, but this does not mean that ordinary type specifications can also be used in the position of deconstructed names. For example, Branch(Branch(l2,v2,r2),v,r)
is possible but Branch(Branch,v,r)
is not (unless you want to bind Branch
to the right subtree of the branch).
Branch(l,v,r) as b
, Int as n
and so on.
This binding gives an additional name for the non-leaf node of class deconstruction or type specification, which is normally not available. It is useful for use static
mode, since given
-case
construct does not automatically make the compiler to treat the given
expression to the matched type. The binding may also be used for nested class deconstructions.
Enumerable{T} for T
, ((Dest,Src)->Dest,Dest,Enumerable{Src})->Dest for Src,Dest
and so on.
The explicit type variables can be used to give an additional name for types, and can only be used in the top-level of case
specification. For other reasons, these type variables are treated as a placeholder _
: Enumerable{T} for T
will be treated as Enumerable{_}
during the inference, for example.
RedBlackBranch(l,v,r,red:true)
and so on. Literal values match for exactly same (i.e. ==
returns true
) values. Such syntax is only allowed for Bool
, Int
, Char
and String
values for the clarity.
Yet another use of the word “type” in naru is, if you still recall, as a keyword. We have already seen that type
attribute gives a compile-time type, but type
can also be used for declaring another type:
MapFromInt{Value} := type <- Map{Int,Value}
To be exact, this does not declare another type but an alias to it. MapFromInt{String}
is exactly same as Map{Int,String}
, for example. This declaration is needed because one cannot put the static parameter for normal declarations like MapFromInt{Value} := Map{Int,Value}
.
type
declarations can also be used for declaring a structural type:
List{+T} := type (Nil() | Cons(head:T, tail:List{T}))
Weekday := type (Mon() | Tue() | Wed() | Thu() | Fri() | Sat() | Sun())
Note that the constructor argument is always required even if it is empty, which resolves the ambiguity in the case
specification. (For example, is Mon
in WeekDate(year,weeknum,Mon)
a reference to the singleton Mon
value or just another name binding?) There is an advanced way to eliminate these parentheses, which is used for true
and false
values—which is defined just as Bool := type (true | false)
.
Having explained about types, we can now put interest to advanced features of functions in naru.
TODO
TODO
with
OperatorWhile with
blocks are used to finalize resources automatically, with
within an expression works differently: it augments the meaning of a preceding expression.
"A(2,4) = ...#((2 ** 65536 with modulo: 1e20) - 3)" println()
--> A(2.4) = ...45587895905719156733
with
operator appends given arguments to the outermost function call or operator of given expression; in this case, exponentiation operator **
is the outermost and since 2 ** 65536
corresponds to #**#(2, 65536)
(see Special Names and Complication) the actual function call would be #**#(2, 65536, modulo: 1e30)
. with
operator has to be parenthesized unless it is the outermost operator for function arguments or the sole statement; you can also use non-keyword arguments like (2 ** 65536 with 1e20)
, or multiple arguments like (sin(30) with :degrees) with accuracy: 0.5(--ulp--)
, but they are not always recommended. For the infix operators, with
can spread to multiple arguments of same precedence (e.g. a + b * c - d with e
is same as #-#(#+#(a, #*#(b, c), e), d, e)
) unless the term of same precedence is parenthesized.
It is mostly useful for options of operators (although normal function call can be augmented as well); the aforementioned a ** b with modulo: c
is one example, which is same as a ** b % c
but much more efficient when b
is very large; many floating point operations have an explicit rounding
argument, like a + b with :trunc
(for rounding towards 0); the range with non-default step can be constructed by a..b with step: c
; and so on.
with
can be applied to a single term as well; in this case, it calls #with
method of given value. For example, #with
method of functions receive a named argument and returns a new function that calls the original function with given argument: thus log with base: 3
is equivalent to fun (x) -> log(x, base: 3)
. Depending on the use, it can be very powerful way to extend naru.
There are a few operators that cannot be augmented in this way. Operators that do not desugar into a function call (e.g. but
operator) are obvious exceptions, and assignments and operation-assignments cannot be augmented as well. with
itself can be augmented.
So far, we have seen two ways for making function: function declaration (name := fun ... end
) and anonymous function (fun (args) -> ...
). There is the third way, an automatic function, that is an obvious but useful shortcut for an anonymous function:
[3, 4, 5] map(_ + 1) println() --> [4, 5, 6]
The full list of shortcuts is as follows (the argument name is only provisional, and actual type does not reflect it):
(_ attr)
(where attr
is an identifier) is equivalent to fun (x) -> x attr
.(_ meth(args...))
is equivalent to fun (x) -> x meth(args...)
.(op _)
is equivalent to (_ op#())
, that is, fun (x) -> op x
.(_ op)
is equivalent to (_ #op())
, that is, fun (x) -> x op
.(_[i])
is equivalent to (_ #[#](i))
, that is, fun (x) -> x[i]
.(_(args...))
is equivalent to (_ #(#)(args...))
, that is, fun (x) -> x(args...)
.(_ op _)
is equivalent to fun (x, y) -> x op y
, or more non-intuitively, #op#
.(_ op y)
is equivalent to fun (y) -> x op y
.(x op _)
is equivalent to fun (x) -> x op y
.(_ op _ with ...)
(and others) is equivalent to fun (x, y) -> x op y with ...
(and others, respectively).Any arguments within an automatic function are evaluated at the time of function construction. So strictly speaking, (_ + 3 println())
and fun (x) -> x + 3 println()
is not equivalent.
The automatic function has to be parenthesized like with
operator, unless it is an outermost operator in a function argument or a sole statement (highly useless though). Therefore the previous use, map(_ + 1)
, is valid, but (_ + 1)(3)
needs parentheses.
TODO
TODO
The tail call “optimization” (TCO) is a way to implement function call (often recursive) with no additional stack use. For example the following range
function:
range(low, high, result=:[]) := fun
if low >= high then return result end
return range(low+1, high, result ~ :[low])
end
…can be efficiently implemented as a loop:
range(low, high, result=:[]) := fun
while not low >= high
result = result ~ :[low]
low += 1
end
return result
end
For some cases, however, the compiler is able to automatically convert the former into the latter, which is far more efficient than the original version. In brief, the tail call that is a form of return somefunc(...)
can be applied. There are some reasons to prefer the former as well; some algorithms are inherently recursive and the transformed code is often unintuitive. Therefore this kind of transformation would be useful if we use lots of them.
The problem is that the tail call optimization is not an optimization. It changes the semantics of the code, as it will change the space complexity; for example, if we have a tail call which is optimized away and we mistakenly make it a non-tail call then we might see the unexpected space complexity increase. I consider TCO as an example of fragile optimization due to this reason. (Some FP people might disagree on this subject, but they have to realize that the programming language is not a mere tool or inference rule but an interface between the machine and programmer.)
Thus instead of making every tail call optimized, naru will only optimize the tail calls which are explicitly specified to be optimizable. This is simple, as you simply have to use \return
instead of return
:
range(low, high, result=:[]) := fun
if low >= high then return result end
\return range(low+1, high, result ~ :[low])
end
(Think of \
as a stack flattening.) If \return
statement is not a tail call (e.g. \return func(42) + 1
) then it would be an error, so you can ensure that you have written a tail call and it will be optimized always. You have to use \return
even when you don’t have the return value, that is, the function returns ()
implicitly.
While a simple program fits in one file, most programs does not; they have to be organized somehow. In naru the module is a unit of program, and it can be imported by import
statement:
-- constants.n
PI := 3.1415926535897932
E := 2.718281828
-- area.n
import constants *
radius = 42
#"area = #(PI * radius**2)" println()
In general the following kinds of import
statements are allowed:
import constants -- imports a module value
import constants PI -- imports a single symbol
import constants (PI, E) -- imports several symbols from one module
import constants PI as PI_16DIGITS -- imports a symbol with an alias
import constants (PI as PI_16DIGITS,
E as E_9DIGITS) -- imports symbols with corresponding aliases
import constants * -- imports every symbols from one module
import constants, math -- imports multiple modules
import (constants, math) -- ditto
import collection SkipList{Int} as IntSkipList
-- imports a specialized symbol with an alias
import collection SkipList{Int} -- this is not allowed
import collection SkipList -- workaround: use this and refer `SkipList{Int}`
There is also a hierarchy of packages, which contain other modules or packages. For example, let’s assume the following hierarchy:
image/
_.n
readers/
bmp.n
png.n
apng.n
gif.n
writers/
bmp.n
png.n
pixmap.n
color.n
colormodel.n
As you can see _.n
is a special file that is run before any import for modules in the image
package. (_
is not allowed in the symbol name in general, so this module has the name of just image
. Also _.n
is optional; if it does not present, children can be still imported but the package itself cannot be imported.) Outside the image
package, one can use import image readers bmp
or import image colormodel
to import bmp
and colormodel
module values respectively. Normal imports are absolute, which means import image readers bmp
is equivalent in the image pixmap
module as well. For relative imports one can use import self readers bmp
(which finds ./readers/bmp.n
); for parent packages one can also use import super readers bmp
(which finds ../readers/bmp.n
) or even import super super image readers bmp
(which finds ../../image/readers/bmp.n
), but I don’t recommend these.
Cyclic imports are allowed as long as there are at most one dynamic module (i.e. the module with assignment statements) in the cycle. So library writers are strongly encouraged to write no dynamic code in the library code unless there is a good reason. TODO mutability of modules
Finally, naru has a built-in package named naru
. The initial runtime module that contains Int
, String
and so on is named naru core
; every naru code implicitly imports naru core
before the first line of it. (i.e. import naru core *
.)
If you are living in the United States, where you don’t need the freaky letters, you probably don’t have to use (or even didn’t heard of) Unicode. However alas, you are not alone in this world. People around the world uses lots of scripts and languages that make Unicode does matter.
Following the trend, naru has a native support for Unicode. Until now I didn’t explain how the user input is handled; I had to postpone the explanation since it does involve Unicode. The following is how the input is (properly) handled:
do
">>> " print()
text = String readln(replace: "")
if text == "" then retry end
but EOFError
"You pressed Ctrl-D or Ctrl-Z!" println()
end
Yes, this is ugly, but all the user input is ugly. String readln
reads one line from the standard input; the replace: ""
means it should replace an erroneous character with given string (here, the empty one) when the line cannot be properly parsed as a Unicode string. If it is none
then it raises the UnicodeDecodeError
exception instead; you have to handle it at your own. The default is none
.
You can read lines from the file likewise:
with f <- File("data.txt") open()
text = String readln(f, replace: "?")
but EOFError
text = ""
end
Or even with a for
loop:
with f <- File("data.txt") open()
for line <- f lines
line println()
end
end
As you can see the FileStream
(the return value from File open
method) value has lines
wrapper, which is iterable (i.e. Enumerable{String}
). But how about the binary file? The natural interface would be bytes
instead of lines
:
with f <- File("data.bin") open()
for byte <- f bytes
byte println()
end
end
In this case, byte
would be an integer ranges from 0 to 255 (for 8-bit bytes, as most computers do). But this means you cannot handle them as sequences as you can see only one byte at a time. What if we are handling zero-terminated bytes, for example?
with f <- File("data.bin") open()
bytes = Bytes readz(f) -- really, a shortcut for Bytes readuntil(f, 0)
end
You can see the new class here: Bytes
(short for byte sequence). It is a lot like Vector{Int}
, but it assumes all elements in it fit in the byte. Also it has several methods that deals with byte sequences, like readuntil
and asciilower
/asciiupper
.
So naru has two string types: String
for an ordinary Unicode string, and Bytes
for a byte sequence. Both can be converted to each other using Bytes decode
and String encode
methods:
with f <- File("data.bin") openrw()
current = f position
bytes = Bytes readz(f)
str = bytes decode("utf-8", replace: "?")
f position = current
str encode("iso-8859-1") printz(f) -- overwrite
end
naru also provides literals for byte sequences, b"..."
and x"..."
. The former may contain a raw byte like b"abc"
but limited to the letters in ASCII, and the latter may contain hexadecimally encoded bytes like x"61 62 63"
.
The core abstraction behind I/O in naru is the stream, represented by the superclass Stream
. It treats the input source (Reader
) and output sink (Writer
) as a stream of bytes, as the direct interface shows:
with f <- File("input") open()
bytes = f read(512)
if bytes length != 512 or bytes[510..511] != x"aa 55" then
"Valid MBR" println()
end
end
with f <- File("output") openwrite()
f position = 0
f write(bytes)
end
There is one important fact about the naru stream: unlike C’s fopen
, it doesn’t distinguish the text mode and binary mode. In the other words every stream in naru is implicitly in the binary mode. Parsing and decoding the bytes from the stream is the job of String
and Bytes
, as they exactly know how to treat bytes as their own value.
TODO
TODO
Syntax extensions are declared by compiler directives, statements starting with use
keyword. Compiler directives themselves are further explained in the next chapter.
If you want to add a new operator rather than extending the existing operators (which can be easily done by overloading #+#
or so), then you need to introduce new tokens: either punctuations or textual keywords. This can be achieved by use token
directive:
use token "~=" -- declares a new punctuation.
use token "unless" -- declares a new keyword.
use token "[\\ \\]" -- declares two new punctuations which are pairs to each other.
use token "` `" -- declares one new punctuation which is a pair to itself.
use token "if" -- error: token already exists.
As shown with declarations for [\
and \]
, tokens are given as a string which can contain any Unicode character. It is possible to declare ≤
operator by use token "\u2264"
, for example. If the token string contains two tokens separated by whitespaces it declares two (not necessarily different) tokens that are opening and closing tokens respectively.
There are some restrictions on available tokens:
_
cannot be (re-)declared, and #
cannot be used."
and '
cannot be used, and %
and \
cannot be the first character of the token unless the entire token is %
(which is already declared by the compiler anyway).+=
) should end with =
.TODO token remapping (and its use in the collision-safe syntax extension); removal (use no token ...
); local token namespace??? OH NOES
TODO moving towards general syntax augmentation.
Sometimes you need arbitrary literals; a regular expression, the feature naru does not support directly, would be one example. However naru does not support arbitrary literals, because parsing arbitrary literals requires the capability of specifying lexical syntax which can be very complex. Therefore instead of supporting arbitrary literals, naru specifies a set of fixed but extensible generic literal syntax.
There are two forms of generic literals in naru:
prefix'...'
, prefix"..."
, prefix'''...'''
or prefix"""..."""
—i.e. a prefix followed by bare string literal without an additional prefix.%prefix(...)
, where (...)
is a contents delimited by matching characters. For example, (...(
does not match and (...)
should be used, but /.../
matches since /
does not have a matching character.The form and prefix uniquely determines the kind of generic literals in the given scope. The prefix can be a valid identifier, a non-negative integer (where 042
and 42
is regarded same) or an empty string. For identifiers the prefix is case-sensitive.
TODO generic literal contents, generic literal definition syntax (use literal ...
maybe?)
Some generic literals are reserved by naru itself. In the following table literals reserved by naru core
module are built-in, and additional modules should be import
ed to use others.
Syntax | Module | Literal Type | Description |
---|---|---|---|
"..." |
naru core |
String |
Principal quoted string. |
r"..." , R"..." |
naru core |
String |
Raw string. No escape sequence recognized except for \" , which prohibits the end of string but still included in the string as is. |
b"..." , B"..." |
naru core |
Bytes |
Principal quoted byte sequence. Can only contain ASCII letters and escape sequences. |
br"..." , BR"..." |
naru core |
Bytes |
Raw byte sequence. |
x"..." , X"..." , bx"..." , BX"..." |
naru core |
Bytes |
Hexadecimal-encoded byte sequence. Any non-hexadecimal character is invalid; any whitespaces are ignored, and there should be even number of digits. |
2"..." , …, 36"..." |
naru core |
Int |
Extended integer literal. |
%(...) , %q(...) , %Q(...) |
naru core |
String |
Alternative quoted string. |
%w(...) |
naru core |
PureVector{String} |
A vector of strings, delimited by one or more whitespaces. |
%r/.../ |
naru text pattern |
Pattern |
A pattern value. Customarily delimited by / . |
TODO
TODO
naru provides compiler directives that controls the behavior of the compiler. Compiler directives are statements starting with use
(which may be followed by no
for negative meaning); the very first statement introduced in this document, use naru
, is an example of compiler directive. They are comparable to #pragma
in C/C++, in the sense that the directives that cannot be comprehended by the compiler are ignored. (Comprehensible but incorrect directives are not ignored though.)
use naru
directiveEvery naru program can contain the use naru
statement as a first non-comment non-empty statement. This is a recommended way to distinguish naru program (rather than editor-specific lines, for example, like -- vim: syn=naru
), though it is totally optional. Every trailing tokens after use naru
are ignored as like other unrecognized directives in order to allow future extension.
Examples:
use naru -- valid
use naru revision 4 -- valid, two tokens ignored
use naru (revision 4) -- valid, four tokens ignored
use naru ]revision 4[ -- syntax error; brackets still should match
use naru for good, not evil -- valid, five tokens ignored (comma is a part of the statement)
use naru """what
the
hell?""" -- valid, one token ignored
Note that use naru
should be the first statement if used; later occurrences are invalid.
use encoding
directiveuse encoding
directive alters the current lexer to read the naru source code (a Unicode string) using given character encoding. The change applys after the statement terminator of the directive (i.e. the first semicolon or newline sequence, where the latter can be up to two characters). The character encoding must be ASCII-compatible, meaning the byte from 0x20 to 0x7e should decode into characters from U+0020 to U+007E; therefore UTF-16 cannot be used, for example. The companion, use no encoding
, sets the current encoding to the default one.
These directives can only be used in the top-level of the program, and the multiple use of them is strongly discouraged. In fact their single use is also discouraged, but there are some legitimate reasons to use them (albeit due to the unfortunate reason).
use encoding "cp949" -- this comment is still encoded in the default encoding
use encoding "shift-jis"; -- this comment is encoded in Shift_JIS
do
use encoding "koi8-r" -- invalid, should appear in the top-level
"blahblah" println()
end
use no encoding
-- now this comment is encoded in the default encoding again
The default encoding is UTF-8 (so the useless UTF-8 BOM will be safely ignored). If you are using a badass editor that does not support UTF-8, you can still prepend the use encoding
directive. One another use case is to avoid non-ASCII characters for the international development, using use encoding "ascii"
.
use token
, use syntax
and use literal
directivesTODO
use static
and use dynamic
directivesuse static
and use dynamic
directives control how the type inference is performed. In short, when the explicit type is missing in the assignment or binding it is assumed to be the dynamic type (*
) when use dynamic
is enabled, or the placeholder (_
) when use static
is enabled. You can still use the static types when use dynamic
is in effect, or the dynamic type when use static
is in effect though. These do not affect the missing argument types (they are always assumed to be dynamic); only missing return types are inferred.
These directives affect the current block, and should be placed at the front of the corresponding block. The default is using use dynamic
. The innermost directive is active, so use dynamic
does have a use.
TODO putting as *
in every subexpression may enable more dynamic behavior, but not sure that it IS really usable; if used, the directive name should be use more dynamic
.
use runtime
directiveTODO
use extern
directiveTODO runs in the function scope:
#+#(lhs:SmallInt, rhs:SmallInt)->Int := fun
lhsv: Int64 = lhs _value >> 1
rhsv: Int64 = rhs _value >> 1
result: Int64 = _
use extern 'narurt.dll,__nr_SIplusSI'(lhsv, rhsv) as result
return SmallInt(result)
end
The memoranda in this section will be eventually merged into the relevant sections.
_
and *
as dual wildcard tokens is intentional, and there is a clear distinction between two tokens: _
should refer a single value while *
may refer multiple values. For example, in the type expression, a placeholder type _
gets assigned only one possible type (inferred one) but a dynamic type *
can get assigned multiple types in runtime.\
is read “flattening” in naru, and its uses outside the literal reflect the name. \return
is spelled a flattening return thus.Only issues that affects the whole document are listed; grep for TODO for local issues.
Missing documentation. Should write the following items somewhere…
_
, Unicode identifier, operator-wrapping identifiers etc.)"""..."""
is yet to be explained?!)use token
, use literal
, use operator
, more? use syntax
would be very complex if used.)Alternative import syntax. For consistency the following syntax is more desirable:
-- assuming "package module" contains "symbol1" and "symbol2" symbols...
mod := import package module -- defines: mod symbol1, mod symbol2
But it seemingly defies the natural extension:
* := import package module -- defines: symbol1, symbol2 (?)
_ := import package module -- defines: module symbol1, module symbol2 (?)
sym1, sym2 := import package module (symbol1, symbol2)
-- defines: sym1, sym2 (???)
The last syntax is especially worse than the original import package module (symbol1 as sym1, symbol2 as sym2)
, but it is not clear that it does matter.
Import rules for syntax extension. To avoid the conflict between same syntax declared in two modules, we should be able to import syntaxes selectively. Example:
import somethingawful "%r" -- the original %r/.../
import naru text pattern "%r" as "%re" -- rename %r/.../ as %re/.../
Or should we have a distinct syntax for syntax declarations? We already have similar ones for operators (#+#
etc.), but we don’t have a good alternative here.
Parametric modules. I hope…
More selective module imports. E.g. import naru core >= 3.1
or import naru core * but String
. Syntax TBD.
Explicit memory layouts. naru ext Struct
should be a base class of all memory-layout-sensitive classes. E.g.
Struct_tm(tm_sec, tm_min, tm_hour, tm_mday, tm_mon, tm_year,
tm_wday, tm_yday, tm_isdst) := class <- Struct
var tm_sec: Int32
var tm_min: Int32
var tm_hour: Int32
var tm_mday: Int32
var tm_mon: Int32
var tm_year: Int32
var tm_wday: Int32
var tm_yday: Int32
var tm_isdst: Int32
end
Therefore the class reflection should preserve the member order. Padding should be specified manually (e.g. _: Int16
), and every non-memory attributes should be specified as an attribute. (The latter would be no problem when the getter-only attribute can be easily defined.)