Rationale for naru programming language

« List of documents

This document closely follows the naru programming language in a nutshell, but provides the rationale for the syntaxes and semantics.

Name of the language

The word naru (pronounced NAH-roo) is chosen because it is short, easy to romanize and still easy to pronounce for most persons; it does not contain a consonant cluster (which is represented as a final consonant in Korean, for example) and every syllable contains a soronant (compared to a variety of obstruents like k or t). Other than these linguistic concerns, the name was chosen almost randomly.

The word itself means a ferry in Korean (나루터 narooteo) and many verbs like “to establish” (成る) or “to fertilize” (生る) in Japanese. It is also used as a proper name of some Koreans and Japaneses. However they are obvious coincidences given the shortness of the word.

naru is never capitalized, because all four characters do not have ascenders in most fonts and thus the word looks uniform.

The Basics

use naru directive

It is not mandatory, but the proposed extension for naru source codes (.n) collides with some other languages (e.g. Nemerle). Until naru became sufficiently common, this precautionary step would be necessary.

The future extension is expected to supply standardized extensions that can be used with this directive.

The end of statement

Using newlines for separating statements is very common, except for C-like languages. Due to its friendliness only using semicolons was not an option. From languages to languages, however, the exact behavior differs significantly:

  • Python: Newlines end the current statement, unless parentheses (and so on) are not closed at the end of line.
  • Ruby: Newlines end the current statement, unless the term is expected at the end of line. For example, 3 + won’t end the statement. There are several other places where newlines are ignored.
  • JavaScript: Newlines end the current statement, unless the first token of the next line can be appended to the first line to form a prefix of valid code.
  • Haskell: Newlines end the current expression (i.e. behave same as semicolons), if the enclosing braces are omitted and the indentation of the next line is same as the current indentation – that is, the indentation of the first token after the (imaginary) opening brace. Additionally, if it was less than the current indentation then the closing brace is inserted.

The rules for Ruby and JavaScript are too confusing and unintuitive. For Ruby, the rules themselves are inconsistent: def followed by a is valid (since a is a function name and it can be preceded by newlines) but class followed by A is invalid (since A is a class name and does not have such a special casing). This inconsistency is attributed to the sheer size of Ruby’s parse.y, which is basically a mess. For JavaScript, there are several edge cases that the automatic semicolon insertion is disabled to fix the ambiguity (e.g. return followed by (a + b + c)). And most importantly, they won’t work when the postfix operator is introduced.

The rule for Haskell is intuitive, but it is clearly indentation-oriented and not appropriate for the use in naru. The rule for Python is also intuitive and does not depend on the parser (only the tokenizer needs the change). Therefore naru uses the rule for Python.

Tokenizer rule

naru follows the so-called maximal munching tokenization: if some code may start with two or more tokens, the parser chooses the longest one. TODO

Structuring

Class, Classy, Classic

Do we really need the class?

There are several approaches to the polymorphism: examples include the subtype polymorphism (inheritance-based OOP), the parametric polymorphism (type parameters and static parameters) and the ad-hoc polymorphism (simple overloading and type classes). Prototype-based OOP can be regarded as a special, highly dynamic variant of the subtype polymorphism.

naru supports these three kinds of polymorphism to the certain extent, but the principal approach to polymorphism is the class-based subtype polymorphism. There are several reasons to this decision:

  • The subtype polymorphism directly supports the hierarchy of types (via subtyping), which is convenient for many situations. The direct analogue for this hierarchy would be the algebraic data type, which is actually a simple hierarchy and does not always extend to more complex hierarchy. (For example, you cannot write like data X = Y ... | Z; data Z = ... in Haskell. However you can bring the pattern matching into the subtype polymorphism easily.)
  • The class-based subtype polymorphism allows for efficient static compilation (which method is well-known for ages). In contrast, other kinds of polymorphism typically requires heavy machinery compared to the subtype polymorphism. The correct (i.e. not botched, I mean, symmetric) multiple dispatch requires a runtime dispatcher for many cases; the type class is open-ended and the position of correct methods cannot be determined easily in compile time; the typical solution to efficiently implement the prototype-based OOP is a just-in-time compiler with hidden classes (or a tracing JIT, if you want). Of course, the subtype polymorphism loses this property when you combine these kinds of polymorphism into the class, but the certain subset of it can be made very efficient.
  • Structural typing has been headache for accidental type collision (i.e. what if two differently used types have the same structure?), and many functional languages have moved to the nominal typing. The subtype polymorphism is well suited for the nominal typing, and in fact the approach of such functional languages is surprisingly similar to the class (records plus nominal types).
  • Many systems are built around the concept of classes (rather unfortunately), and such systems are not limited to a single language; a common example is Microsoft’s COM.

Note that the full support for subtype polymorphism does not necessarily mean the full support for object-oriented programming. Unlike the traditional object-oriented languages, naru does not have a strict concept of visibility and encapsulation; many use cases for them are handled by the unified scoping rules.

No strict visibility

The concept of visibility is greatly exaggerated in object-oriented languages. It does not give an additional advantage over idioms like privatename_. And if you want the best performance, you still need all private properties to compile the code using that class; in this regard, it is even inferior to the opaque pointer which is commonly used in C to emulate the object orientation. (The problem with the opaque pointer is, of course, that it’s not very convenient.)

Therefore naru primarily relies on the idiom for this problem, instead of the separate language feature. However this idiom is sacred; identifiers starting with _ can be freely reorganized or moved by the compiler, so it still retains the benefits of the visibility approach.

Polymorphism

Handling Errors

Special Names and Complication

Type Hierarchy

More on Numbers

More on Collections

Another Kind of Types

More on Functions

Modular Programming

O Holy Unicode, plus I/O

Syntax Extensions

Deep Internals


ikiwiki를 씁니다.
마지막 수정