Esoteric eXecution Environment

Estoeric eXecution Environment(EXE) is my attempt to design more extensible, complete and future-proof ?esolang OS interface. Yeah, you can laugh now.

Goal

Complete feature set, including networking and 2D graphics.
~~EsoAPI/PESOIX compatibility.~~
Multiple external protocol.
Capability model.
etc etc.

Specification

WARNING WARNING this specification is no way complete nor usable yet WARNING WARNING

Data Types and Serialization

EXE supports two external protocols, one is EsoAPI-style (starting with 00 byte, “binary protocol”) and other one is textual (“textual protocol”). Why two protocols? Input in some esolangs is restricted to printable characters, and especially null byte kills it.

So single function is exposed via two different serialization format. It consists of the following atomic data types:

Fixed byte (notation: 42)
Boolean (notation: ?)
Narrow integer (notation: xx)
Wide integer (notation: xxxx)
Long integer (notation: xx..xx)
Text string (notation: "string")
Binary string (notation: |binary|)
International string (notation: u|binary|)

And it also defines two protocol sequences:

Introduction sequence, used to enable EXE
Invocation sequence, used to invoke an EXE function

Fixed byte

Fixed byte is denoted as 00, 01, …, fe, ff, and represents single fixed byte inherent to the function.

In binary protocol it occupies one byte (octet) with that value.

In textual protocol it is represented as a textual integer followed by space. (i.e. "0 " to "255 ")

Binary example: 2a
Textual example: "42 "

Boolean

Boolean is denoted as ??, and represents true or false value. In addition it can have error state if the function needs.

In binary protocol, true is represented as byte 1 (01), false is represented as byte 0 (00) and error is represented as byte 255 (ff).

In textual protocol, true is represented as period ("."), false is represented as comma (",") and error is represented as hyphen ("-"). Since three characters are consecutive in ASCII (in the order of comma, hyphen and period) it is easy to treat error as true or false.

Binary example: 00, 01, ff
Textual example: ",", ".", "-"

Narrow integer

Narrow integer is denoted as xx, and represents the number from 0 to 255. Additional limit can be imposed with the additional specification.

In binary protocol it occupies one byte (octet) with that value.

In textual protocol it is represented as a textual integer followed by space. Note that this format is shared with wide integer, but narrow integer is useful for languages only with byte storage.

Binary example: 42
Textual example: "66 "

Wide integer

Wide integer is denoted as xxxx, and represents the number from 0 to 65535. Additional limit can be imposed with the additional specification.

If extension prefix is not used, it is same with xx and overflow (value larger than 255) results in the error. This behavior is independent to the serialization format. In that case, unless other error reporting mechanism is specified, the value will be 255.

In binary protocol it occupies one byte (octet) with that value normally. But with the presence of extension prefix, it is represented as 16-bit big endian encoding. (i.e. most significant byte first)

In textual protocol it is represented as a textual integer followed by space.

Binary example: 12 34 (for value 0x1234)
Textual example: "4660 "

Long integer

Long integer is denoted as xx..xx, and represents arbitrary-precision integer. It is also capable for negative number.

Note that implementations can impose the reasonable limit (i.e. 32-bit or 64-bit signed integer) in the input, as long as it is same or wider than wide integer.

The serialization format is a form of ?? aa ?? bb ... for integer 0xaabb…, where ?? is true unless it ends the end of number. Thus it is a big endian format. Negative number is represented as absolute value, with the first boolean value is error. Thus error should be reported by some other means.

Both implementation and client program is not required to generate the minimal representation, but zero should be represented as one byte: 00 in binary, "," in textual.

Binary example:
01 02 01 62 01 5a 01 00 00 (for 40000000),
02 2a 00 (for -42),
00 (for 0)

Textual example: ".2 .98 .90 .0 ,", "-42 ,", ","

Text string

Text string is denoted as "string", and represents a simple string only consists of printable characters (i.e. ASCII 0x20 to 0x7e).

In binary protocol the string, represented one byte per character, is followed by null character 00.

In textual protocol, separator "#" (ASCII 0x7c, see also below) is followed by the actual string, which is then followed by delimiter "@" (ASCII 0x40). If the string contains the separator it escapes the next character, including delimiter and separator itself. Therefore separator could mark the end of message, the start of text string and the start of escape sequence depending on context.

Binary example:
48 65 6c 6c 6f 00 (for “Hello”),
40 5f 7c 00 (for “@_#”),
00 (for empty string)

Textual example: "#Hello@", "##@_##@", "#@"

Binary string

Binary string is denoted as |binary|, and represents arbitrary byte string which could contain unprintable bytes, including null byte.

An encoding of binary string is very similar to long integer, treating the string as base-256 positive integer. Of course it cannot be negative, and the length of string should be preserved: no null byte should be omitted or prepended.

Binary example:
01 12 01 34 01 56 01 78 01 ff 00 (for binary 12 34 56 78 ff),
01 00 01 00 01 00 00 (for binary 00 00 00),
00 (for empty string)

Textual example: ".18 .52 .86 .120 .255 ,", ".0 .0 .0 ,", ","

International string

International string is denoted as u|string|, and represents UTF-8-encoded Unicode string. Besides from the encoding constraint it is same to binary string, hence the notation.

Invocation sequence

When the EXE is enabled, the function can be invoked via the following syntax:

$xx yy ... zz #

Here $ with no trailing spaces denotes invocation sequence introducer(ISI), and # denotes end-of-message separator or simply separator.

Separator is only used in the textual protocol, and can be also ignored if the message is immediately followed by other message:

$xx yy ... zz $aa bb ... cc #

Corresponding results should be followed by separator, unless there is no result. The remaining part of this document omits # for convenience.

In binary protocol, ISI is one byte 00, and separator is ignored. This makes EXE binary protocol very similar to other existing systems in appearance.

In textual protocol, ISI is "<TODO: invoc seq>", and separator is "#" (ASCII 0x23).

For example, assume we are to open some file named “foo”. In the binary protocol: > Input: 00 ff 03 09 01 66 01 6f 01 6f 00
Output: 12 34

And In the textual protocol: > Input: "<TODO: invoc seq>255 3 9 .102 .111 .111 ,#"
Output: "4660 #"

Separator makes a lot easier to read an integer via scanf("%d", ...)-like input routine, as otherwise we cannot recognize the message followed literal spaces with it.

In this document, the function specification is written as follows: $12 34 xx yyyy "foo" => zz..zz. Here => separates function arguments and results, and if there is no result -> zz..zz part can be omitted altogether.

Introduction sequence

Introduction sequence introduces the usage of EXE. It should be the very first output from the program.

EXE implementation should not treat any occurrence of introduction sequence or invocation sequence special if there was no introduction sequence at the first of output. In the other words, it will print out literally.

When the program prints introduction sequence, EXE implementation should return a boolean (in the appropriate serialization format) which is true if EXE is now enabled, or false if not. If the boolean was false, the error message (as a text string) will follow and no more output will be processed by the implementation. No separator is used in the textual protocol.

In binary protocol, introduction sequence is three bytes 00 20 00. The last byte can be changed when incompatible change is made into the protocol.

In textual protocol, introduction sequence is "<TODO: intro seq>". The last (TODO: last what?) can be changed when incompatible change is made into the protocol.

Binary example: > << 00 20 00
> >> 01 (success)
> >> 00 45 52 52 4f 52 21 00 (error, message is “ERROR!”)

Textual example: > << "<TODO: intro seq>"
> >> "." (success)
> >> ",#ERROR!@" (error, message is “ERROR!”)

After EXE is enabled, introduction sequence (which is in turn equivalent to $20 00 in the binary protocol) is not treated specially. In the binary protocol it will cause an error, as $20 function is not assigned.

Capability and System Information (`$21`)

Once EXE is enabled, the program can negotiate for desired capability. The program can also require system information for its execution. These operations are done via $21 xx sequence.

$21 01 xxxx returns ?? which is true only if given capability is available.
$21 02 xxxx tries to enable given capability. On failure it does nothing.
$21 03 xxxx tries to disable given capability. On failure it does nothing.
$21 04 "namespace" returns ?? which is true only if given namespace is available.
$21 05 xx "namespace" maps $xx commands (xx ranging 80 to ef) to the given namespace. On failure it does nothing, even doesn’t unassign the pre-existing command. If the namespace string is empty, it unassigns the corresponding command.
$21 10 returns "string", which identifies the EXE implementation: interpreters or EXE layer provider.

The standard capabilities are as follows:

CAP_UNSAFE(0): Unsafe functions are enabled in general.
CAP_IO(1): General I/O is enabled.
CAP_FILEIO(2): File I/O is enabled.

The implementation can ask user for result of these commands. In the other words, the extension writer should attempt to split required capabilities.

Error Checking (`$22`)

EXE doesn’t require the program to input the success/failed byte; it will be very cumbersome for many system calls. Instead it replaces most such input with unified error interface.

$22 01 returns xxxx, which is an error code.
$22 02 returns |command|, which is the command string which caused the last error. This command string includes the invocation sequence and extension prefix, and differs for the protocol. If error code is zero, it returns empty string.
$22 03 returns "message", which is the detailed error message. If error code is zero, it returns empty string.
$22 04 clears the last error.

Note that these functions never fail, even if $22 01 overflows. (It instead returns ERR_OVERFLOW_SELF, and doesn’t set the error code itself.)

Standard error code is as follows:

ERR_NONE(0): No error is ocurred, or reseted before.
ERR_UNASSIGNED(1): Unassigned command is called.
ERR_INVALID_ARG(2): Has invalid argument in general.
ERR_INVALID_FORMAT(3): Unserialization has been failed. (TODO resynchronization)
ERR_OVERFLOW(4): Wide integer has been overflown.
ERR_NO_CAP(5): Capabilities needed for this function are not available.
ERR_CAP_DENIED(6): Requested some capability, but denied by the user.
ERR_NO_CAP_DISABLE(7): Cannot disable capability, due to implementation choice or other causes.
…
ERR_INVALID_HANDLE(16)
ERR_NOT_APPLICABLE(17)
ERR_FILE_EXISTS(18)
ERR_NO_FILE_EXISTS(19)
ERR_WOULD_BLOCK(20)
…
ERR_OVERFLOW_SELF(255): Wide integer has been overflown while returning the error code. This is not a true error code; if this error occurs one can retry with extension prefix to get an actual error code.

I/O Functions (`$00` to `$06`)

Basically, I/O is done via handles; it is analogous to UNIX’s file descriptor. Handle 0 is defined as standard input and output.

$00 xx prints the byte in verbatim. Thus to print null character in the binary protocol, one should print 00 00 00 instead.
$01 xxxx changes current input handle. On failure it does nothing. (Tip: if xxxx is 0, it will revert to the standard input.)
$02 xxxx changes current output handle. On failure it does nothing. (Tip: if xxxx is 0, it will revert to the standard output.)
$03 xx u|path| opens the file handle and returns xxxx which is that handle. Path should be not empty, and on failure it will return handle 0. The first byte is mode, which is explained later.
$04 flushes current output handle. On failure it does nothing.
$05 xxxx yyyy ... does handle-specific thing depending on yyyy. For the standard file, input and output, the following functions are available:
- $05 xxxx 01 returns ??, which is true only if no more input is expected from the handle. It tests for EOF; peeking is supported by $05 xxxx 08 etc. On failure it returns error.
- $05 xxxx 02 returns xx..xx, which is the offset from the beginning of file. On failure it returns -1.
- $05 xxxx 03 yy..yy sets the file pointer, relative to the beginning of file. On failure it does nothing.
- $05 xxxx 04 yy..yy sets the file pointer, relative to the current pointer. On failure it does nothing.
- $05 xxxx 05 yy..yy sets the file pointer, relative to the end of file. On failure it does nothing.
- $05 xxxx 06 sets the handle non-blocking. On failure it does nothing.
- $05 xxxx 07 sets the handle blocking. On failure it does nothing.
- $05 xxxx 08 returns ??, which is true only if the handle is blocking. On failure it returns error.
- $05 xxxx 09 returns ??, which is true only if the handle is available for reading. On failure it returns error.
- $05 xxxx 0a is same to $05 xxxx 09, but it’s for writing.
$06 xxxx closes given handle. If it’s current input or output handle, it will silently revert to the standard input or output. On failure it does nothing. Handle 0 cannot be closed.

The open mode is combined value of the following flags:

first two bits: read or write?
- 0x01: open the file for read only.
- 0x02: open the file for write only.
- 0x03: open the file for both read and write.
next two bits: behavior for file existence.
- 0x00: open the file. (not applicable for read-only)
- 0x04: open the file, and truncates. (not applicable for read-only)
- 0x08: open the file only if the file exists.
- 0x0c: open the file only if the file doesn’t exist.
next one bit: initial position.
- 0x00: initially the pointer is at the beginning of file.
- 0x10: initially the pointer is at the end of file.
next one bit: seekable?
- 0x00: the file is not seekable.
- 0x20: the file is seekable.

For example, fopen’s mode can be translated as follows: "r" is 0x09, "w" is 0x06, "a" is 0x11, "r+" is 0x23, "w+" is 0x27 and "a+" is 0x33.

Most I/O functions require capability CAP_IO to be enabled, with the exception of $00 and $04 which don’t require arbitrary I/O. File I/O ($03) require capability CAP_FILEIO to be enabled, and possibly (for writing) CAP_UNSAFE too.

Date and Time (`$0b`)

$0b commands are used for retrieving current date and time.

$0b 01 returns xx which is seconds after last minute. (00 to 3d)
$0b 02 returns xx which is minutes after last hour. (00 to 3b)
$0b 03 returns xx which is hours after last day. (00 to 17)
$0b 04 returns xx which is a day of the month. (01 to 1f)
$0b 05 returns xx which is a month of the year. (01 to 0c)
$0b 06 returns xxxx which is the number of years after 1900, except for 255 which means “overflow” if the extension prefix is not used and the current year is greater than 2154.
$0b 07 is same to $0b 01 $0b 02 $0b 03. (returns three bytes represent current time)
$0b 08 is same to $0b 04 $0b 05 $0b 06. (returns three or four bytes represent current date)
$0b 09 is same to $0b 07 $0b 08. (returns six or seven bytes represent current date and time)
$0b 0a returns xxxx which runs from 0 to maximum value (255 or 65535) during the whole second. For example, if current fraction of second is 0.384 then $0b 0a will return 62(~ 0.384 * 256) and $ff 0b 0a will return 62 4e(~ 0.384 * 65536).
$0a 0b returns ?? hh mm which represents local timezone. ?? is true only if UTC offset is negative (UTC+0 is considered non-negative); hh and mm is absolute offset in hours and minutes.
$0b 11 to $0b 19 is same as $0b 01 to $0b 09, but uses UTC instead of local time.

Command-line Argument (`$0c`)

EXE provides the access to command-line arguments via $0c commands. Format of command-line arguments is same to C/C++’s argv, thus the first argument contains the name of program.

$0c xxxx has two functions:
- If the argument is zero, it returns xxxx, which is the number of command-line arguments.
- Otherwise it returns |string| which is the argument at given index: 1 is the name of program, 2 is the first argument and so on. The argument string is truncated at null byte, even if the argument contains null character. If the index is out of bound or doesn’t exist, it should return empty string.

Dynamic Region (`$80` to `$ef`)

Commands from $80 to $ef are initially unassigned (thus causes an error), but can be assigned the meaning via 00 21 05 command.

Extension Prefix (`$ff`)

$ff xx ... is special prefix: its behavior is exactly same to $xx ... command, but its argument and return value (denoted xxxx, not xx) will be limited to 65535, not 255. This also applies to the textual protocol, thus languages only capable for byte storage can easily select the needed behavior.

$ff ff ... is reserved for 32-bit encoding.

Reserved Commands

The command $07, $08, $09 and $10(not $0a) are reserved for compatibility of other layers. Without any assumption, using them results in the error.

The command $0a and $0d xx is guaranteed to do nothing, even in the textual protocol. See “force-to-flush commands” section for more information.

The command $20 is same to introduction sequence in the binary protocol, and thus reserved to avoid the confusion.

Commands from $f0 to $fe are reserved for future expansion.

Force-to-flush Commands

There are two sequences which end with newline sequence and do nothing, in order to flush invocation sequences to the output. The rationale is that several esolang interpreters only flush the output when newline is printed out, so having command which flushes the output and otherwise does nothing is helpful for them.

Invocation sequence followed by one byte sequence 0a(for “”), or two byte sequences 0d 0a(for “”) does nothing. These sequences are not fixed byte values, but literal bytes even in the textual protocol. For this purpose, $0a and $0d xx commands (which overlaps these sequences in the binary protocol) are also declared no-op.

Acknowledgement

Thanks to Arvid Norlander and Gregor Richards for valuable suggestions.