Estoeric eXecution Environment(EXE) is my attempt to design more extensible, complete and future-proof. Yeah, you can laugh now.
WARNING WARNING this specification is no way complete nor usable yet WARNING WARNING
EXE supports two external protocols, one is EsoAPI-style (starting with
00 byte, “binary protocol”) and other one is textual (“textual protocol”). Why two protocols? Input in some esolangs is restricted to printable characters, and especially null byte kills it.
So single function is exposed via two different serialization format. It consists of the following atomic data types:
And it also defines two protocol sequences:
Fixed byte is denoted as
ff, and represents single fixed byte inherent to the function.
In binary protocol it occupies one byte (octet) with that value.
In textual protocol it is represented as a textual integer followed by space. (i.e.
"0 " to
Boolean is denoted as
??, and represents true or false value. In addition it can have error state if the function needs.
In binary protocol, true is represented as byte 1 (
01), false is represented as byte 0 (
00) and error is represented as byte 255 (
In textual protocol, true is represented as period (
"."), false is represented as comma (
",") and error is represented as hyphen (
"-"). Since three characters are consecutive in ASCII (in the order of comma, hyphen and period) it is easy to treat error as true or false.
Narrow integer is denoted as
xx, and represents the number from 0 to 255. Additional limit can be imposed with the additional specification.
In binary protocol it occupies one byte (octet) with that value.
In textual protocol it is represented as a textual integer followed by space. Note that this format is shared with wide integer, but narrow integer is useful for languages only with byte storage.
Wide integer is denoted as
xxxx, and represents the number from 0 to 65535. Additional limit can be imposed with the additional specification.
If extension prefix is not used, it is same with
xx and overflow (value larger than 255) results in the error. This behavior is independent to the serialization format. In that case, unless other error reporting mechanism is specified, the value will be 255.
In binary protocol it occupies one byte (octet) with that value normally. But with the presence of extension prefix, it is represented as 16-bit big endian encoding. (i.e. most significant byte first)
In textual protocol it is represented as a textual integer followed by space.
12 34(for value 0x1234)
Long integer is denoted as
xx..xx, and represents arbitrary-precision integer. It is also capable for negative number.
Note that implementations can impose the reasonable limit (i.e. 32-bit or 64-bit signed integer) in the input, as long as it is same or wider than wide integer.
The serialization format is a form of
?? aa ?? bb ... for integer 0xaabb…, where
?? is true unless it ends the end of number. Thus it is a big endian format. Negative number is represented as absolute value, with the first boolean value is error. Thus error should be reported by some other means.
Both implementation and client program is not required to generate the minimal representation, but zero should be represented as one byte:
00 in binary,
"," in textual.
01 02 01 62 01 5a 01 00 00(for 40000000),
02 2a 00(for -42),
".2 .98 .90 .0 ,",
Text string is denoted as
"string", and represents a simple string only consists of printable characters (i.e. ASCII 0x20 to 0x7e).
In binary protocol the string, represented one byte per character, is followed by null character
In textual protocol, separator
"#" (ASCII 0x7c, see also below) is followed by the actual string, which is then followed by delimiter
"@" (ASCII 0x40). If the string contains the separator it escapes the next character, including delimiter and separator itself. Therefore separator could mark the end of message, the start of text string and the start of escape sequence depending on context.
48 65 6c 6c 6f 00(for “Hello”),
40 5f 7c 00(for “@_#”),
00(for empty string)
Binary string is denoted as
|binary|, and represents arbitrary byte string which could contain unprintable bytes, including null byte.
An encoding of binary string is very similar to long integer, treating the string as base-256 positive integer. Of course it cannot be negative, and the length of string should be preserved: no null byte should be omitted or prepended.
01 12 01 34 01 56 01 78 01 ff 00(for binary
12 34 56 78 ff),
01 00 01 00 01 00 00(for binary
00 00 00),
00(for empty string)
".18 .52 .86 .120 .255 ,",
".0 .0 .0 ,",
International string is denoted as
u|string|, and represents UTF-8-encoded Unicode string. Besides from the encoding constraint it is same to binary string, hence the notation.
When the EXE is enabled, the function can be invoked via the following syntax:
$xx yy ... zz #
$ with no trailing spaces denotes invocation sequence introducer(ISI), and
# denotes end-of-message separator or simply separator.
Separator is only used in the textual protocol, and can be also ignored if the message is immediately followed by other message:
$xx yy ... zz $aa bb ... cc #
Corresponding results should be followed by separator, unless there is no result. The remaining part of this document omits
# for convenience.
In binary protocol, ISI is one byte
00, and separator is ignored. This makes EXE binary protocol very similar to other existing systems in appearance.
In textual protocol, ISI is
"<TODO: invoc seq>", and separator is
"#" (ASCII 0x23).
For example, assume we are to open some file named “foo”. In the binary protocol: > Input:
00 ff 03 09 01 66 01 6f 01 6f 00
And In the textual protocol: > Input:
"<TODO: invoc seq>255 3 9 .102 .111 .111 ,#"
Separator makes a lot easier to read an integer via
scanf("%d", ...)-like input routine, as otherwise we cannot recognize the message followed literal spaces with it.
In this document, the function specification is written as follows:
$12 34 xx yyyy "foo" => zz..zz. Here
=> separates function arguments and results, and if there is no result
-> zz..zz part can be omitted altogether.
Introduction sequence introduces the usage of EXE. It should be the very first output from the program.
EXE implementation should not treat any occurrence of introduction sequence or invocation sequence special if there was no introduction sequence at the first of output. In the other words, it will print out literally.
When the program prints introduction sequence, EXE implementation should return a boolean (in the appropriate serialization format) which is true if EXE is now enabled, or false if not. If the boolean was false, the error message (as a text string) will follow and no more output will be processed by the implementation. No separator is used in the textual protocol.
In binary protocol, introduction sequence is three bytes
00 20 00. The last byte can be changed when incompatible change is made into the protocol.
In textual protocol, introduction sequence is
"<TODO: intro seq>". The last (TODO: last what?) can be changed when incompatible change is made into the protocol.
Binary example: >
<< 00 20 00
>> 00 45 52 52 4f 52 21 00(error, message is “ERROR!”)
Textual example: >
<< "<TODO: intro seq>"
>> ",#ERROR!@"(error, message is “ERROR!”)
After EXE is enabled, introduction sequence (which is in turn equivalent to
$20 00 in the binary protocol) is not treated specially. In the binary protocol it will cause an error, as
$20 function is not assigned.
Once EXE is enabled, the program can negotiate for desired capability. The program can also require system information for its execution. These operations are done via
$21 xx sequence.
$21 01 xxxxreturns
??which is true only if given capability is available.
$21 02 xxxxtries to enable given capability. On failure it does nothing.
$21 03 xxxxtries to disable given capability. On failure it does nothing.
$21 04 "namespace"returns
??which is true only if given namespace is available.
$21 05 xx "namespace"maps
ef) to the given namespace. On failure it does nothing, even doesn’t unassign the pre-existing command. If the namespace string is empty, it unassigns the corresponding command.
"string", which identifies the EXE implementation: interpreters or EXE layer provider.
The standard capabilities are as follows:
CAP_UNSAFE(0): Unsafe functions are enabled in general.
CAP_IO(1): General I/O is enabled.
CAP_FILEIO(2): File I/O is enabled.
The implementation can ask user for result of these commands. In the other words, the extension writer should attempt to split required capabilities.
EXE doesn’t require the program to input the success/failed byte; it will be very cumbersome for many system calls. Instead it replaces most such input with unified error interface.
xxxx, which is an error code.
|command|, which is the command string which caused the last error. This command string includes the invocation sequence and extension prefix, and differs for the protocol. If error code is zero, it returns empty string.
"message", which is the detailed error message. If error code is zero, it returns empty string.
$22 04clears the last error.
Note that these functions never fail, even if
$22 01 overflows. (It instead returns
ERR_OVERFLOW_SELF, and doesn’t set the error code itself.)
Standard error code is as follows:
ERR_NONE(0): No error is ocurred, or reseted before.
ERR_UNASSIGNED(1): Unassigned command is called.
ERR_INVALID_ARG(2): Has invalid argument in general.
ERR_INVALID_FORMAT(3): Unserialization has been failed. (TODO resynchronization)
ERR_OVERFLOW(4): Wide integer has been overflown.
ERR_NO_CAP(5): Capabilities needed for this function are not available.
ERR_CAP_DENIED(6): Requested some capability, but denied by the user.
ERR_NO_CAP_DISABLE(7): Cannot disable capability, due to implementation choice or other causes.
ERR_OVERFLOW_SELF(255): Wide integer has been overflown while returning the error code. This is not a true error code; if this error occurs one can retry with extension prefix to get an actual error code.
Basically, I/O is done via handles; it is analogous to UNIX’s file descriptor. Handle 0 is defined as standard input and output.
$00 xxprints the byte in verbatim. Thus to print null character in the binary protocol, one should print
00 00 00instead.
$01 xxxxchanges current input handle. On failure it does nothing. (Tip: if
xxxxis 0, it will revert to the standard input.)
$02 xxxxchanges current output handle. On failure it does nothing. (Tip: if
xxxxis 0, it will revert to the standard output.)
$03 xx u|path|opens the file handle and returns
xxxxwhich is that handle. Path should be not empty, and on failure it will return handle 0. The first byte is mode, which is explained later.
$04flushes current output handle. On failure it does nothing.
$05 xxxx yyyy ...does handle-specific thing depending on
yyyy. For the standard file, input and output, the following functions are available:
$05 xxxx 01returns
??, which is true only if no more input is expected from the handle. It tests for EOF; peeking is supported by
$05 xxxx 08etc. On failure it returns error.
$05 xxxx 02returns
xx..xx, which is the offset from the beginning of file. On failure it returns -1.
$05 xxxx 03 yy..yysets the file pointer, relative to the beginning of file. On failure it does nothing.
$05 xxxx 04 yy..yysets the file pointer, relative to the current pointer. On failure it does nothing.
$05 xxxx 05 yy..yysets the file pointer, relative to the end of file. On failure it does nothing.
$05 xxxx 06sets the handle non-blocking. On failure it does nothing.
$05 xxxx 07sets the handle blocking. On failure it does nothing.
$05 xxxx 08returns
??, which is true only if the handle is blocking. On failure it returns error.
$05 xxxx 09returns
??, which is true only if the handle is available for reading. On failure it returns error.
$05 xxxx 0ais same to
$05 xxxx 09, but it’s for writing.
$06 xxxxcloses given handle. If it’s current input or output handle, it will silently revert to the standard input or output. On failure it does nothing. Handle 0 cannot be closed.
The open mode is combined value of the following flags:
fopen’s mode can be translated as follows:
"r" is 0x09,
"w" is 0x06,
"a" is 0x11,
"r+" is 0x23,
"w+" is 0x27 and
"a+" is 0x33.
Most I/O functions require capability
CAP_IO to be enabled, with the exception of
$04 which don’t require arbitrary I/O. File I/O (
$03) require capability
CAP_FILEIO to be enabled, and possibly (for writing)
$0b commands are used for retrieving current date and time.
xxwhich is seconds after last minute. (
xxwhich is minutes after last hour. (
xxwhich is hours after last day. (
xxwhich is a day of the month. (
xxwhich is a month of the year. (
xxxxwhich is the number of years after 1900, except for 255 which means “overflow” if the extension prefix is not used and the current year is greater than 2154.
$0b 07is same to
$0b 01 $0b 02 $0b 03. (returns three bytes represent current time)
$0b 08is same to
$0b 04 $0b 05 $0b 06. (returns three or four bytes represent current date)
$0b 09is same to
$0b 07 $0b 08. (returns six or seven bytes represent current date and time)
xxxxwhich runs from 0 to maximum value (255 or 65535) during the whole second. For example, if current fraction of second is 0.384 then
$0b 0awill return
62(~ 0.384 * 256) and
$ff 0b 0awill return
62 4e(~ 0.384 * 65536).
?? hh mmwhich represents local timezone.
??is true only if UTC offset is negative (UTC+0 is considered non-negative);
mmis absolute offset in hours and minutes.
$0b 19is same as
$0b 09, but uses UTC instead of local time.
EXE provides the access to command-line arguments via
$0c commands. Format of command-line arguments is same to C/C++’s
argv, thus the first argument contains the name of program.
$0c xxxxhas two functions:
xxxx, which is the number of command-line arguments.
|string|which is the argument at given index: 1 is the name of program, 2 is the first argument and so on. The argument string is truncated at null byte, even if the argument contains null character. If the index is out of bound or doesn’t exist, it should return empty string.
$ef are initially unassigned (thus causes an error), but can be assigned the meaning via
00 21 05 command.
$ff xx ... is special prefix: its behavior is exactly same to
$xx ... command, but its argument and return value (denoted
xx) will be limited to 65535, not 255. This also applies to the textual protocol, thus languages only capable for byte storage can easily select the needed behavior.
$ff ff ... is reserved for 32-bit encoding.
$0a) are reserved for compatibility of other layers. Without any assumption, using them results in the error.
$0d xx is guaranteed to do nothing, even in the textual protocol. See “force-to-flush commands” section for more information.
$20 is same to introduction sequence in the binary protocol, and thus reserved to avoid the confusion.
$fe are reserved for future expansion.
There are two sequences which end with newline sequence and do nothing, in order to flush invocation sequences to the output. The rationale is that several esolang interpreters only flush the output when newline is printed out, so having command which flushes the output and otherwise does nothing is helpful for them.
Invocation sequence followed by one byte sequence
0a(for “”), or two byte sequences
0d 0a(for “”) does nothing. These sequences are not fixed byte values, but literal bytes even in the textual protocol. For this purpose,
$0d xx commands (which overlaps these sequences in the binary protocol) are also declared no-op.
Thanks to Arvid Norlander and Gregor Richards for valuable suggestions.