XUMUL

XUMUL (pronounced /sɯmul/ or so, which means twenty in Korean) is an yet another XML-based esoteric programming language in which, uh, number twenty has an important role (hence the name).

Specification

XUMUL code is essentially a standalone, valid XML document. For example, the following is a valid XUMUL code, does nothing.

<?xml version="1.0" encoding="utf-8" ?>
<p><x/></p>

Commands

There are two constructs in XUMUL code. One is a singleton element, which is just a normal command; another one is a non-singleton (including one contains only text nodes), called a container. A container is like a subroutine, which is called by singleton element with same name. For example,

<subr><f/><g/></subr>
<subr/>
<subr/>

defines one container subr, which executes f and g sequentially, and executes it twice.

Every XUMUL program contains exactly one container, which name can be anything (but start tag and end tag should match).

There can be two or more containers sharing same name: in that case, each definition affect commands before next definition. There can also be nested containers, which only affects commands in the inner container. For example,

<prog>
    <a><f/><g/></a>
    <b>
        <a/> -- This calls "a" container outside.
        <a><g/><f/></a>
        <a/> -- Now this calls newly defined "a" container.
    </b>
    <b/>
    <a/> -- This calls first "a" container, since container
            definition is local to parent container.
    <b/> -- The definition is also local to each call,
            so "b" behaves same as prior.
</prog>

Every command resolves to actual definition, and gets executed. In this sense XUMUL’s scope resolution is not lexical, since it will depend on symbols of caller. Note that the container is also in the symbol table of itself.

<prog>
    <a><b/></a>
    <b>
        <c>
            <a/> -- This calls "a" container outside and
                    will call first "b" container recursively.
            <b><g/></b>
            <a/> -- Now "a" container calls second "b"
                    container.
        </c>
    </b>
    <z><z/></z> -- This is just a simple infinite recursion.
</prog>

Texts and XML comments are ignored in the source code, so you can use them as a comment. For example, the following program also does nothing.

<p>foo<!--bar-->quux</p>

Program memory

There is one shared, infinite memory for every containers. The memory consists of 13.923568-bit-long integers (“cells”), i.e. Unicode codepoints ranging from 0 to 1114111.

There is one memory pointer (MP) per container. At first MP points to offset 0. When the child container is executed, it will start with caller’s MP, but any change in new MP will not affect caller’s MP.

<prog>
    <p>
        -- Suppose that MP is changed here.
        <p/> -- It will start with new MP.
        -- MP is remaining same here, whatever inner "p"
           does: MP is local to a call, not a container.
           Of course, this assumes the program flow went
           here eventually, which is not true in XUMUL.
    </p>
</prog>

Of course there are some special commands which changes caller’s MP, since MP cannot be changed without them.

One important feature (or bug?) of XUMUL is that every cell in the memory is mapped to XML document. Here a character means Unicode character, with a exception of newline sequences which is replaced to LF (U+000A) when the program is loaded. Thus first cell maps to first character (generally <), second cell maps to second character, and so on. XML document is assumed to have infinite trailing whitespaces (which is ignored by XML processor, of course), so out of bound MP also works. Negative MP works as well, in that case the cell before the first cell maps to infinitieth character, the cell before that maps to second-to-the-infinitieth character and so on.

For example when the document was <foo/> and 10th cell (offset 9) is set to X, the document becomes <foo/>___X where _ is whitespace. And if the cell at offset -3 is set to Y, the document becomes <foo/>___X, infinite whitespaces and Y__. Any amount of infinite whitespace is fine with XML, but you should consider whether new document is valid, or it will terminate.

Command semantics

I mentioned there are some special commands which changes caller’s MP, but actually there are none. At least initially.

Every command which doesn’t resolve to any definition does nothing, and called nop. But if they are executed in a container, it makes the caller executes certain function before return:

<p><a/><b/></p>
<p/> -- It will do nothing, but it will do something
        reserved for two-nop container.
<a><c/></a>
<p/> -- It will execute "a" container, and it will do
        something reserved for one-nop container.
        Obviously it does not count callees' nop.

It also means certain “special” command can change its meaning depending on outer symbol table. But who cares. Anyway meanings of such containers are as follows:

0-nop: Does nothing. Note that if some command is declared with no elements (e.g. foo) it is not a nop.
1-nop: Increments MP.
2-nop: Decrements MP.
3-nop: Increments the cell at MP. If the cell contains 1114111, it wraps to 0.
4-nop: Decrements the cell at MP. If the cell contains 0, it wraps to 1114111.
5-nop and more: N-nop does same thing with (N-4)-nop, but it does twice. For example 5-nop increments MP twice, 9-nop increments MP four times, 21-nop increments MP 32 times, and so on.

Note that the program can only execute the commands sequentially or recurses infinitely, so it should use self-modification to terminate the recursion.

Input and output

CDATA section does I/O operation. It dumps its contents to the standard output, or transmutes its contents to characters read from the standard input according to the number of prior nops. Of course transmuted CDATA section can be splitted or even invalid.

Prior nops mean nops which is executed in the current container prior to the CDATA section. For example, assuming “p” is not declared <![CDATA[..]]> has one prior nop and <q/><![CDATA[..]]> has two prior nops. If “p” is defined as <q/><q/>, <q/><![CDATA[..]]> has one prior nops because only nop executed directly in the container is <q/>.

If the number of prior nops is zero or even, CDATA section does the input; otherwise it does the output. Note that prior nops are just nops, so it also affects what is executed after the container returns.

For example, the following program reads one character and writes it.

<prog>
    <io><_flag/><![CDATA[?]]></io>
    "io" outputs the shared buffer, unless "_flag"
    is defined.

    <input><_flag>nop</_flag><io/></input>
    "input" declares "_flag" to do nothing (but not nop) 
    so "io" does input the shared buffer.

    <output><io/></output>
    "output" needs to be declared since "io" changes
    MP of the caller. Here changed MP is local to "output".

    <input/><output/>
</prog>

All newline sequence (CR, CR+LF, LF) is translated to LF in the input. At EOF it does not change the contents of the comment furthermore. (And if two or more characters are to be read, it only overwrites non-EOF characters at the first.)

XML modification

When the program modifies XML document, it is not used until the corresponding container returns. For example,

-- This container will modify XML document.
<inc><x/><x/><x/></inc>

<p>
    <q><inc/></q>
    <q/> -- When "q" returns it updates current XML document.
    <inc/> -- This modifies XML document, but does not
              update current XML document yet.
    <inc/> -- Ditto.
    -- When "p" returns it updates current XML document.
</p>
<p/> -- It updates current XML document twice.
<p/> -- Ditto.

The interpreter tracks the current command and call stack as a character offset. For example, consider the following XML:

<p>
    <q>
        <r>
            <s/>
            -- Offset to current command
        </r>
        <r/>
        -- Offset to next command after returning from "r"
    </q>
    <q/>
    -- Offset to next command after returning from "q"
</p>
<p/>
-- Offset to next command after returning from "p"

It continues to work even if the current command swaps the name of “q” and “r”. But if the current command replaces enclosing <r> and </r> to whitespaces and updates current XML, the document becomes like this:

<p>
    <q>
            <s/>
            -- Offset to current command
        <r/>
        -- Offset to next command after returning from "r"
    </q>
    <q/>
    -- Offset to next command after returning from "q"
</p>
<p/>
-- Offset to next command after returning from "p"

So current command executes <r/>, returns from “q” (which encloses current command), and current offset becomes the character after <r/>.

One can use this (somewhat strange, but quite required) behavior to get infinite recursion without explicit one:

<prog>
    <p>
        -- Removes &lt;p&gt; and &lt;/p&gt;.
    </p>
    <q>
        -- Restores &lt;p&gt; and &lt;/p&gt; at original position.
    </q>
    <r><q/><p/></r>
    <r/>
</prog>

<r/> will call , which removes enclosing tags so <r/> is executed in “p”, resulting infinite recursion.

If the modified XML document (with 3-nop or 4-nop container, for example) is not valid XML, the program terminates silently; this is a convenient way to exit the program.

If the program modifies an XML encoding declaration, the program is converted to byte string and re-converted back to Unicode string with specified encoding. If this fails, the program terminates silently.

Example

Hello, world!

The following program prints “Hello, world!”.

<HelloWorld>
    <nop/>
    <![CDATA[Hello, world!
]]>
</HelloWorld>

Cat program

The following program copies its input to the output.

<cat>
    <out><_flag/><![CDATA[_]]></out>
    <in><_flag>x</_flag><out/></in>
    <loop><in/><out/><loop/></loop>
</cat>

Conditional loop

The following program writes 0 to 9. It shows an example of conditional loop.

<p>
    <x>
        <nop/>
        <![CDATA[0
]]>
        <v><![CDATA[]]5</v></x><x><v><![CDATA[]]></v>
        <i/>
        <r/><r/><r/><r/><r/><r/><r/><r/><r/><r/>
        <r/><r/><r/><r/><r/><r/><r/><r/><r/><r/>
        <r/><r/><r/><r/><r/><r/><r/><r/>
        <i/>
        <l/><l/><l/><l/><l/><l/><l/><l/><l/><l/>
        <l/><l/><l/><l/><l/><l/><l/><l/><l/><l/>
        <l/><l/><l/><l/><l/><l/><l/><l/>
        <x/>
    </x>
    <r><nop/></r>
    <l><nop/><nop/></l>
    <i><nop/><nop/><nop/></i>
    <r/><r/><r/><r/><r/><r/><r/><r/><r/><r/>
    <r/><r/><r/><r/><r/><r/><r/><r/><r/><r/>
    <r/><r/><r/><r/><r/><r/><r/><r/><r/><r/>
    <r/><r/><r/><r/><r/><r/><r/><r/><r/><r/>
    <r/><r/><r/><r/>
    <x/>
</p>

Implementation

For now there is no known implementation of XUMUL.