Esoteric eXecution Environment

Estoeric eXecution Environment(EXE) is my attempt to design more extensible, complete and future-proof ?esolang OS interface. Yeah, you can laugh now.

Goal

Specification

WARNING WARNING this specification is no way complete nor usable yet WARNING WARNING

Data Types and Serialization

EXE supports two external protocols, one is EsoAPI-style (starting with 00 byte, “binary protocol”) and other one is textual (“textual protocol”). Why two protocols? Input in some esolangs is restricted to printable characters, and especially null byte kills it.

So single function is exposed via two different serialization format. It consists of the following atomic data types:

And it also defines two protocol sequences:

Fixed byte

Fixed byte is denoted as 00, 01, …, fe, ff, and represents single fixed byte inherent to the function.

In binary protocol it occupies one byte (octet) with that value.

In textual protocol it is represented as a textual integer followed by space. (i.e. "0 " to "255 ")

Binary example: 2a
Textual example: "42 "

Boolean

Boolean is denoted as ??, and represents true or false value. In addition it can have error state if the function needs.

In binary protocol, true is represented as byte 1 (01), false is represented as byte 0 (00) and error is represented as byte 255 (ff).

In textual protocol, true is represented as period ("."), false is represented as comma (",") and error is represented as hyphen ("-"). Since three characters are consecutive in ASCII (in the order of comma, hyphen and period) it is easy to treat error as true or false.

Binary example: 00, 01, ff
Textual example: ",", ".", "-"

Narrow integer

Narrow integer is denoted as xx, and represents the number from 0 to 255. Additional limit can be imposed with the additional specification.

In binary protocol it occupies one byte (octet) with that value.

In textual protocol it is represented as a textual integer followed by space. Note that this format is shared with wide integer, but narrow integer is useful for languages only with byte storage.

Binary example: 42
Textual example: "66 "

Wide integer

Wide integer is denoted as xxxx, and represents the number from 0 to 65535. Additional limit can be imposed with the additional specification.

If extension prefix is not used, it is same with xx and overflow (value larger than 255) results in the error. This behavior is independent to the serialization format. In that case, unless other error reporting mechanism is specified, the value will be 255.

In binary protocol it occupies one byte (octet) with that value normally. But with the presence of extension prefix, it is represented as 16-bit big endian encoding. (i.e. most significant byte first)

In textual protocol it is represented as a textual integer followed by space.

Binary example: 12 34 (for value 0x1234)
Textual example: "4660 "

Long integer

Long integer is denoted as xx..xx, and represents arbitrary-precision integer. It is also capable for negative number.

Note that implementations can impose the reasonable limit (i.e. 32-bit or 64-bit signed integer) in the input, as long as it is same or wider than wide integer.

The serialization format is a form of ?? aa ?? bb ... for integer 0xaabb…, where ?? is true unless it ends the end of number. Thus it is a big endian format. Negative number is represented as absolute value, with the first boolean value is error. Thus error should be reported by some other means.

Both implementation and client program is not required to generate the minimal representation, but zero should be represented as one byte: 00 in binary, "," in textual.

Binary example:
01 02 01 62 01 5a 01 00 00 (for 40000000),
02 2a 00 (for -42),
00 (for 0)

Textual example: ".2 .98 .90 .0 ,", "-42 ,", ","

Text string

Text string is denoted as "string", and represents a simple string only consists of printable characters (i.e. ASCII 0x20 to 0x7e).

In binary protocol the string, represented one byte per character, is followed by null character 00.

In textual protocol, separator "#" (ASCII 0x7c, see also below) is followed by the actual string, which is then followed by delimiter "@" (ASCII 0x40). If the string contains the separator it escapes the next character, including delimiter and separator itself. Therefore separator could mark the end of message, the start of text string and the start of escape sequence depending on context.

Binary example:
48 65 6c 6c 6f 00 (for “Hello”),
40 5f 7c 00 (for “@_#”),
00 (for empty string)

Textual example: "#Hello@", "##@_##@", "#@"

Binary string

Binary string is denoted as |binary|, and represents arbitrary byte string which could contain unprintable bytes, including null byte.

An encoding of binary string is very similar to long integer, treating the string as base-256 positive integer. Of course it cannot be negative, and the length of string should be preserved: no null byte should be omitted or prepended.

Binary example:
01 12 01 34 01 56 01 78 01 ff 00 (for binary 12 34 56 78 ff),
01 00 01 00 01 00 00 (for binary 00 00 00),
00 (for empty string)

Textual example: ".18 .52 .86 .120 .255 ,", ".0 .0 .0 ,", ","

International string

International string is denoted as u|string|, and represents UTF-8-encoded Unicode string. Besides from the encoding constraint it is same to binary string, hence the notation.

Invocation sequence

When the EXE is enabled, the function can be invoked via the following syntax:

$xx yy ... zz #

Here $ with no trailing spaces denotes invocation sequence introducer(ISI), and # denotes end-of-message separator or simply separator.

Separator is only used in the textual protocol, and can be also ignored if the message is immediately followed by other message:

$xx yy ... zz $aa bb ... cc #

Corresponding results should be followed by separator, unless there is no result. The remaining part of this document omits # for convenience.

In binary protocol, ISI is one byte 00, and separator is ignored. This makes EXE binary protocol very similar to other existing systems in appearance.

In textual protocol, ISI is "<TODO: invoc seq>", and separator is "#" (ASCII 0x23).

For example, assume we are to open some file named “foo”. In the binary protocol: > Input: 00 ff 03 09 01 66 01 6f 01 6f 00
Output: 12 34

And In the textual protocol: > Input: "<TODO: invoc seq>255 3 9 .102 .111 .111 ,#"
Output: "4660 #"

Separator makes a lot easier to read an integer via scanf("%d", ...)-like input routine, as otherwise we cannot recognize the message followed literal spaces with it.

In this document, the function specification is written as follows: $12 34 xx yyyy "foo" => zz..zz. Here => separates function arguments and results, and if there is no result -> zz..zz part can be omitted altogether.

Introduction sequence

Introduction sequence introduces the usage of EXE. It should be the very first output from the program.

EXE implementation should not treat any occurrence of introduction sequence or invocation sequence special if there was no introduction sequence at the first of output. In the other words, it will print out literally.

When the program prints introduction sequence, EXE implementation should return a boolean (in the appropriate serialization format) which is true if EXE is now enabled, or false if not. If the boolean was false, the error message (as a text string) will follow and no more output will be processed by the implementation. No separator is used in the textual protocol.

In binary protocol, introduction sequence is three bytes 00 20 00. The last byte can be changed when incompatible change is made into the protocol.

In textual protocol, introduction sequence is "<TODO: intro seq>". The last (TODO: last what?) can be changed when incompatible change is made into the protocol.

Binary example: > << 00 20 00
> >> 01 (success)
> >> 00 45 52 52 4f 52 21 00 (error, message is “ERROR!”)

Textual example: > << "<TODO: intro seq>"
> >> "." (success)
> >> ",#ERROR!@" (error, message is “ERROR!”)

After EXE is enabled, introduction sequence (which is in turn equivalent to $20 00 in the binary protocol) is not treated specially. In the binary protocol it will cause an error, as $20 function is not assigned.

Capability and System Information ($21)

Once EXE is enabled, the program can negotiate for desired capability. The program can also require system information for its execution. These operations are done via $21 xx sequence.

The standard capabilities are as follows:

The implementation can ask user for result of these commands. In the other words, the extension writer should attempt to split required capabilities.

Error Checking ($22)

EXE doesn’t require the program to input the success/failed byte; it will be very cumbersome for many system calls. Instead it replaces most such input with unified error interface.

Note that these functions never fail, even if $22 01 overflows. (It instead returns ERR_OVERFLOW_SELF, and doesn’t set the error code itself.)

Standard error code is as follows:

I/O Functions ($00 to $06)

Basically, I/O is done via handles; it is analogous to UNIX’s file descriptor. Handle 0 is defined as standard input and output.

The open mode is combined value of the following flags:

For example, fopen’s mode can be translated as follows: "r" is 0x09, "w" is 0x06, "a" is 0x11, "r+" is 0x23, "w+" is 0x27 and "a+" is 0x33.

Most I/O functions require capability CAP_IO to be enabled, with the exception of $00 and $04 which don’t require arbitrary I/O. File I/O ($03) require capability CAP_FILEIO to be enabled, and possibly (for writing) CAP_UNSAFE too.

Date and Time ($0b)

$0b commands are used for retrieving current date and time.

Command-line Argument ($0c)

EXE provides the access to command-line arguments via $0c commands. Format of command-line arguments is same to C/C++’s argv, thus the first argument contains the name of program.

Dynamic Region ($80 to $ef)

Commands from $80 to $ef are initially unassigned (thus causes an error), but can be assigned the meaning via 00 21 05 command.

Extension Prefix ($ff)

$ff xx ... is special prefix: its behavior is exactly same to $xx ... command, but its argument and return value (denoted xxxx, not xx) will be limited to 65535, not 255. This also applies to the textual protocol, thus languages only capable for byte storage can easily select the needed behavior.

$ff ff ... is reserved for 32-bit encoding.

Reserved Commands

The command $07, $08, $09 and $10(not $0a) are reserved for compatibility of other layers. Without any assumption, using them results in the error.

The command $0a and $0d xx is guaranteed to do nothing, even in the textual protocol. See “force-to-flush commands” section for more information.

The command $20 is same to introduction sequence in the binary protocol, and thus reserved to avoid the confusion.

Commands from $f0 to $fe are reserved for future expansion.

Force-to-flush Commands

There are two sequences which end with newline sequence and do nothing, in order to flush invocation sequences to the output. The rationale is that several esolang interpreters only flush the output when newline is printed out, so having command which flushes the output and otherwise does nothing is helpful for them.

Invocation sequence followed by one byte sequence 0a(for “”), or two byte sequences 0d 0a(for “”) does nothing. These sequences are not fixed byte values, but literal bytes even in the textual protocol. For this purpose, $0a and $0d xx commands (which overlaps these sequences in the binary protocol) are also declared no-op.

Acknowledgement

Thanks to Arvid Norlander and Gregor Richards for valuable suggestions.


ikiwiki를 씁니다.
마지막 수정