I don't want to go to Chel-C
We are constantly reading of stories with one main theme where people turn to seemingly "simple" systems in order to manage complexity; except that these systems manage to produce complex problems, without having to acquire historical cruft, or suffering from any of the usual issues that produce non-essential complexity. There is a trend in programming language design, and a particular language that we will focus on, that are both seemingly gaining popularity today, for their minimalism and supposed simplicity. However, neither is awfully simple in practise; it appears there are good reasons why, and we will give some reasons at the end of this article.
Barbarism begins at $HOME
In a DevGAMM presentation, Jon Blow managed to convince himself and other game developers that high level languages are going to cause the "collapse of civilisation" due to a loss of some sort of "capability" to do low-level programming, and that abstraction will make people forget how to do things. We shall start off with a description of the alternative by Charles Antony Richard Hoare, and then work our way through some arguments we have heard against that line of thought:
We asked our customers whether they wished us to provide an option to switch off these checks in the interests of efficiency on production runs. Unanimously, they urged us not to - they already knew how frequently subscript errors occur on production runs where failure to detect them could be disastrous. I note with fear and horror that even in 1980, language designers and users have not learned this lesson. In any respectable branch of engineering, failure to observe such elementary precautions would have long been against the law.1
We note with fear and horror that even in 2022, language designers and users have not learned this lesson.
While Jon argues that safety features are useless to "professional programmers"2 or "programmers that know what they're doing", and programmers are "afraid of pointers" or such, we derive our fear from how even the most talented computer scientists that we know make mistakes. Indeed the CWE team claims that "out-of-bounds writes" and "out-of-bounds reads" are the first and third most dangerous software weaknesses, respectively, and "use after free" is the seventh most dangerous.3 Jon must have found other company, who manage to never write buggy software, and do it without any techniques for systematically avoiding bugs. He mentions the design of highly reliable software systems, but merely writing perfect software is not enough, at the level of reliability he discusses.
Expanding redundancy
Good physical engineering crucially relies on redundancy, which is the property of multiple ways to ensure that something doesn't go wrong. Such redundancy protects against expected and unexpected forms of failure. Software engineering also relies on introducing redundancy,4 and redundancy may be achieved with static analyses such as static type checking and theorem proving, runtime checks such as array bounds5 and dynamic type checks, and system design with fallback and replicated subsystems.
Highly reliable software systems are possible, but they require those forms of redundancy that Jon argues are unnecessary for "real programmers". In particular, such systems require fault isolation, which can only be achieved by pervasive runtime checks, as not checking allows faults to propagate uncontrollably. While Jon insists that defects should never exist, Joe Armstrong insists in his infamous thesis, with some humility, that "from the outset, we know that faults will occur".6 Even if all programmers were perfect, the latter mindset also allows for minimising the effects of hardware faults; while Jon is aware that physics influences what sort of programs can be run fast, ignoring hardware faults is ignoring that physics dictates how a program run at all; programs are run on real electronic circuits, which can have faults, can lose connections and power, and so on. Even formal reasoning about programs cannot (easily) model random bit-flips in defective memory chips, but fault isolation and redundancy can attempt to work around it. These sorts of issues, while occuring with low probability, do appear in redundant systems which are around long enough to encounter many issues;7 thus high reliability systems do require more than simply not making mistakes while programming.
However, there appear to be reasons to avoid such forms of redundancy. We have heard that runtime checks are prohibitively slow, but modern processors with branch prediction can easily predict that errors will not occur. Furthermore, the cost of the system running into an unexpected state is drastically worse than the loss of efficiency in many situations (especially when reliability is such a great concern), and thus detecting an error is invaluable in comparison. We have similarly heard that safe languages impair writing some algorithms, but we never heard just which algorithms those were, and we never encountered any algorithms which require using unsafe code ourselves. Furthermore, any programs using such algorithms would rarely be valid programs written in those languages, either, as unsafe code typically invokes some sort of undefined behaviour in the language. Not invoking UB does not necessarily produce a correct or reliable program, but a program invoking UB is never correct, as it has no defined behaviour to consider "correct".
There is also the argument that these issues only occur with "complex" code, and "simple" code can be shown to not have out-of-bounds accesses. However, the reasoning applied, on average, is incredibly sloppy and should not be confused with any vigorous formal analysis of the code, which would be substantially more difficult. For example, the specification framework for the C language by Andronick et al8 does not support "references to local variables, goto statements, expressions with uncontrolled side-effects, switch statements using fall-through, unions, floating point arithmetic, or calls to function pointers". If someone does not want to use a language which provides for bounds checks, due to a lack of some sort of "capability", then we cannot imagine that they will want to use any subset of C that can be formally modelled!
A notion of code complexity is also influenced by the choice of language for that code; the C language does not provide for bounds or pointer checks, so introducing explicit checks will necessarily result in more code and more control flow paths, and code without these checks will necessarily appear to be simpler.9 Thus the idea of "simple" C code not requiring redundancy requires circular reasoning; the only simple code that can be written in C does not have any redundancy. Though, indeed, the idea was false to start with. The combination of how formal reasoning for even a subset of C is difficult, and the burden being put on the programmer to explicitly insert runtime checks, leads to "simple" C code ironically being very undesirable.
Abstract reasoning about abstractions
Using abstractions extensively actively encourages study into how to implement them efficiently, contrary to Blow's claim that their use will lead to few understanding their implementation. The implementation of higher level languages rather provides a context in which better study of how to optimise for a given computer is possible. A programmer could indeed break abstractions, and write low-level code themselves, to generate the best-performing code that they can think of. The programmer would then be effectively running the perfect optimising compiler in their head. But if a programmer is instead motivated to stick to these higher level constructs, they have to express their mental model precisely, in the form of code for the compiler to use. The latter requires deeper thought, as the programmer has to formalise and "explain" their knowledge, in a way similar to "learning by teaching". Hence abstractions encourage deeper thought into how to get consistent performance out of high-level code, rather than one-off low-level trickery.
One practical example of optimising such abstractions is the implementation of dynamic dispatch in object-oriented programming languages. The C++ programming language offers dynamic dispatch, but it is almost implemented with a "virtual table" lookup. If the lookup is too slow, a programmer may avoid dynamic dispatch and instead use regular functions to regain performance. However, this is not an option for implementations of the Smalltalk programming language, where every method call requires dynamic dispatch. Thus high-performance Smalltalk systems make use of inline caching.10 The Self programming language took the messaging paradigm further: Self solely uses method dispatch for "instance" or "local" variables, as they are only exposed via reader and writer methods, and Self uses fewer primitives and performance hacks than Smalltalk-80. The combination required faster dispatch, using polymorphic inline caching11 to speed up method lookup alongside splitting12 and type feedback13 to eliminate method calls entirely. The general trend is that if an abstraction offers appealing principles, attempting to follow the principle consistently requires investigating how to make the abstraction convenient, as inconvenience will cause someone to abandon the abstraction. The performance of an abstraction can be a crucial part of its convenience to a programmer.
Another property of abstractions is that improving the performance of abstractions improves performance for all users of the abstraction. Having such abstractions reduces the amount of code that needs to be written, producing simpler code, and allows for the implementor to fine-tune implementations for maximum performance on every machine they target, rather than every user doing it themselves for every program. As Robert Bernecky states on his time implementing APL systems:
In the late 1970’s, I was manager of the APL development department at I.P. Sharp Associates Limited. A number of users of our system were concerned about the performance of the
∨.∧
inner product [for transitive closures] on large Boolean arrays in graph computations. I realized that a permuted loop order would permit vectorization of the Boolean calculations, even on a non-vector machine. David Allen implemented the algorithm and obtained a thousand-fold speedup factor on the problem. This made all Boolean matrix products immediately practical in APL, and our user (and many others) went away very happy.What made things even better was that the work had benefit for all inner products, not just the Boolean ones. The standard
+.×
[matrix multiplication] now ran 2.5—3 times faster than Fortran. […] So, rather than merely speeding up one library subroutine, we sped up a whole family of hundreds of such routines (even those that had never been used yet!), with no more effort than would have been required for one.14
Bernecky's optimisation crucially relies on the inner-product abstraction; in particular, the implementation can use the best loop order for any inner-product operation, and the user merely has to use the inner-product operation to benefit, regardless of what the inner-product is being used to compute. The higher level of abstraction of having such an operation also makes reordering easier, compared to the user writing the specific loops to use, and the compiler re-arranging loops to produce the best iteration order.
Conclusion
You're quite right. C is memory unsafe, and the large quantity of C code in IoT devices is, as you rightly say, a disaster - only it has already happened; no waiting is involved.15
The supposed collapse of civilisation due to bad software is already here, and Mr. Blow doesn't want you to know why. Even great programmers still use redundancy measures to allow them to make mistakes without causing harm; we'd even say that the greatest programmers are so because of how they produce redundancy. The persistent use of redundancy and abstraction encourages research into how to make it efficient, which is quite the opposite of forgetting how any low-level trickery is performed.
As Henry Baker reminded us, there are many contexts "in which bad programming style can kill people".9 We can only hope that people figure out what to blame, before the idiotic view of producing high-quality software that Blow et al promote eventually does kill people.
Pulling rabbits out of a hat, and how to make a 9000 MIPS desktop run like a 100MHz Pentium 1
The same sort of false sense of "simplicity" has affected other projects far detached from the work of Blow and colleagues. The uxn machine is a stack machine with byte-addressed memory, and is programmed in uxntal, an assembler for the machine. It is claimed this assembler is like Forth, but it is not interactive, nor does it have the ability to define new immediate words.
While there is the recreational programming aspect of the machine, some argue that the design lends itself to software and hardware preservation. More generally the virtual machine is associated with permacomputing. The page on permacomputing by the designer of uxn describes frugal computing and salvage computing as principles of permacomputing, defining them as "utilizing computational resources as finite and precious, to be utilised only when necessary, and as effectively as possible", and "utilizing only already available computational resources, to be limited by that which is already produced." The author is part of a collective that wanted to replace all the "bloated" software they used, due to having little energy storage on their sailboat.
Using software design techniques to reduce power usage, and to allow continued use of old computers is a good idea,16 but the uxn machine has quite the opposite effect, due to inefficient implementations and a poorly designed virtual machine, which does not lend itself to writing an efficient implementation easily. We believe that such an implementation would be more difficult to write than a similarly efficient implementation of a more conventional programming language. Either the implementor must handle more complexity, or the user must accept a large loss of efficiency, making using less powerful computers less feasible, and wasting most of the performance of the hardware in more powerful computers.
Our remarks are quite similar to those by Michal Jirků and Stanislav Datskovskiy on the Urbit platform.17, 18 As well as both platforms having inherent difficulty to writing an efficient implementation, both platforms are posited as being "clean-slate computing stacks", though they are both hosted on some other stack; typically with an implementation written in C on a Unix-like system. The uxn platform has been ported to other non-Unix-like systems, but it is still not self-hosting, which has been routinely ignored as a part of bootstrapping. Thus uxn does not really escape mainstream computing stacks, much like the Urbit VM.
Implementation strategy
The most significant performance issue is that all uxn implementations we have found use naïve interpretation. A typical estimation is that an interpreter is between one and two magnitudes slower than a good compiler. We instead propose the use of dynamic translation to generate native code, which should eliminate the overhead associated with interpreting instructions.
The use of dynamic translation to improve performance is not new; its use for hardware emulation is at least as old as emulation of the HP 300019 and emulation of the VAX on Alpha machines,20 and its use for dynamic languages (for which static approaches alone do not improve much) is at least as old as the APL\3000 implementation of APL.21 A translator can be as sophisticated as an optimising compiler (and indeed can use an optimising compiler backend, such as LLVM), but a simple translator can also suffice in avoiding overhead due to interpretation.
To demonstrate, we will compare a C program compiled to x86-64 code, a translation of the program running under a uxn interpreter, and the program translated naïvely to x86-64 assembler. It may seem odd to compare to a C program, after stating our thoughts on the C language, but the C and uxn languages both use fixed-size integers and are both unsafe (as we will discuss later), so the performance should only differ due to implementation strategy.
We test with a recursive function that computes the 32nd Fibonacci number.
/* gcc 11.2.0 * Compile with `gcc -O2 fib.c -o fib` */ unsigned int fib(unsigned int n) { if (n < 2) return n; return fib(n - 1) + fib(n - 2); } int main() { unsigned int sum = 0; for (int x = 0; x < 100; x++) /* Use the result somehow, so the call isn't removed. * The time taken to do this addition is negligible, compared * to computing fib(0x20). */ sum += fib(0x20); return sum; }
( https://git.sr.ht/~rabbits/uxn at commit 05d8b44 Compile with `uxntal fib.uxn fib.rom` Run with `uxnemu fib.rom` ) %RET { JMP2r } %DONE { #01 #0f DEO BRK } |0100 #0020 ;fib JSR2 DONE @fib ( N -- fib[N] ) ( not[n < 2] equivalent to n > 1 ) DUP2 #0001 GTH2 ,&inductive-case; JCN RET &inductive-case; DUP2 #0001 SUB2 ;fib JSR2 ( stack now N fib[N-1] ) SWP2 #0002 SUB2 ;fib JSR2 ( stack now fib[N-1] fib[N-2] ) ADD2 RET
Our translation is done solely with the macro feature of the nasm
assembler. Each uxn instruction is translated to a sequence of x86-64
instructions, with no optimisation between instructions performed. The
translation (including macros) consists of 91 lines of assembly code,
so we have put it in another file. Note that the build script for the
uxn implementation tested defaults to building with the compilation
flag -Os
which optimizes for size, so we replaced it with -O2
to
optimize for speed, which reduces the time taken to run our program by
about 30%. It is unlikely that any applications benefit from optimizing for
size, as the interpreter loop only uses 6,526 bytes when compiled with
-O2
, and should easily fit in L1 instruction cache.
We observed that the C program takes 5.8ms to call fib(0x20)
, the
uxn virtual machine 537ms, and the translation 82ms. (As per the
title, the program executed at about 9 billion instructions per
second.) In other words, the uxn virtual machine introduces two
magnitudes of time overhead, and a simple translator is successful at
improving throughput by about 6.5 times. However, the translated code
is still 14 times as slow as the output of an optimising compiler.
A more sophisticated translator is necessary to reduce the remaining overhead, but the simplest translator still produces a considerable speedup. There are some optimisations which are not too hard to write, nonetheless. For example, mapping some part of the data stack into registers, for which the arrangement is known at compile-time, merely requires the translator to maintain a "virtual stack".22 A colleague also told us that they considered peephole optimisations "fair game" for a simple translator, as they consist of replacing a particular sequence of instructions with another sequence, without performing any sophisticated analysis. The use of peepholing reduces runtime to 34ms, which is only 5.9 times slower than an optimising compiler.
Word sizes
The careful reader will note that the uxn program, and thus our
translation, use 16-bit arithmetic, and the result overflows an
unsigned 16-bit integer. The uxn virtual machine only exposes 8-bit
and 16-bit instructions, so we cannot perform 32-bit instructions
"natively". Writing a decent big-integer implementation ourselves
requires some design which we would rather not perform, if
possible. Instead, we can use a library for simulated 32-bit
arithmetic. We can modify our definition of the fib
function to use
it:
@fib ( N -- fib_N ) ( not[n < 2] equivalent to n > 1 ) DUP2 #0001 GTH2 ,&inductive-case; JCN #0000 SWP RET &inductive-case; DUP2 #0001 SUB2 ;fib JSR2 ( stack now N fib[N-1]x2 ) ROT2 #0002 SUB2 ;fib JSR2 ( stack now fib[N-1]x2 fib[N-2]x2 ) ;add32 JSR2 RET
The resulting program takes 1.95 seconds to run in the "official" uxn implementation, or 390 times slower than native code! This amount of overhead greatly restricts what sort of computations can be done at an acceptable speed on modern computers; and using older computers would be out of the question.23
It could be argued that the use of small operations frees 16-bit machines from simulating larger arithmetic. But there are few machines with small arithmetic units; we estimate that most desktop computers since 1990 can perform most 32-bit operations no slower than 16-bit operations, and most since 2010 can perform most 64-bit operations no slower than 32-bit operations.24 There is also no end to the number of 32-bit microcontrollers being manufactured, either; thus there are not many computers that cannot perform 32-bit arithmetic without loss of performance.
Of course, the size of the arithmetic unit is not the only factor affecting performance. Larger integers require more bits to store, thus reducing the effectiveness of cache memory and vectorisation. But it could be worthwhile to use integers even smaller than 8 bits to conserve cache, and clever algorithms can effectively manipulate "packs" of such integers.25 Vectorisation is also out of the question, because there are no compilers for uxn code, let alone vectorising compilers. We can only conclude that the provided instruction sizes are arbitrary, and not optimised for performance or portability, yet they are not suitable for many applications either.
While we believe dynamic translation is the best approach to improving the performance of uxn programs, because we cannot change the instruction set, we note that one way to reduce interpretation overhead in a purely interpreted system is to have instructions perform more work, so that fewer instructions and instruction dispatches are required. If someone did find a computer for which arithmetic on integers larger than 16 bits would be slow, emulating arithmetic in the host language, rather than in uxn, would still be faster by avoiding interpretation overhead. Such a computer would be old and slow enough that improving performance would be very important, in order to have a usable computer. (We believe that finding such a computer would be very difficult, compared to the myriad of machines that have been produced more recently, and so computers that are that old are not worth the effort targeting. But other people have different ideas of what is worth what effort, so we still make the point irregardless of our opinion there.)
As an example of reducing overhead in this manner, an APL
implementation may perform element-wise addition of two large matrices
after dispatching on just one +
; resulting in the time spent in operator
dispatch being negligible. The implementor of such operators can use
whatever algorithms they know are suited for the hardware; the Dyalog
APL implementation has many of these.26 However, emulating
larger integers with smaller integers in uxn has the opposite effect;
what could be a single instruction is instead dozens of
instructions, and the resulting interpretation overhead is very large.
Is performance an actual issue?
It may appear that the performance is irrelevant, as uxn programs are often not compute-intensive, and so most time will be spent managing graphics and other I/O in the host language and in native code. But programmers who find themselves "off the beaten path" while still doing some kind of graphical work, cannot benefit from any acceleration provided by the virtual machine; this has been demonstrated at least in windowing systems and raycasting, which are both noticeably slow on somewhat slower hardware, to no fault of the hardware itself. This kind of acceleration also adds to both implementation and interface complexity, as any code which needs to run quickly is proposed to be added as a primitive "device" in the implementation. Thus we believe the appropriate approach is to have more uniform performance by allowing for higher performance implementations.
The low performance may also seem acceptable, as the machine and language could be considered "domain specific", but what uxn implementations can run efficiently is a more rigid and a smaller category than what might be considered "graphical" programs or "video games". Having many virtual machines and languages hurts portability in another dimension, as fewer programs are written targetting each machine, or equivalently more machines would need to be ported to run some number of programs. The model of having specialised devices represeting reduces the modifications that need to be done to the virtual machine, but proposed devices such as polygon drawing remain non-trivial to implement. Such devices really should be implemented in uxn, as the algorithms used would be good targets for preservation. The bytecode form of a program may be more space-efficient, and either bytecode or source forms of a program will be more precise than natural language definitions of the algorithms.27 Distributing algorithms in such a form also makes them much more portable, as the implementor does not need to do anything special to use such algorithms in their implementation, but that desire puts us back where we started, with slow execution due to interpretation of the program. To maintain efficiency and simplicity, we thus need to be able to perform some kind of compilation of the program.
Redundancy, security and implementation
It is also argued that the uxn virtual machine can be simulated with pen and paper. Following from our discussion of word sizes, humans are terrible at reasoning about modular arithmetic, which is manifested in the occurrence of integer overflow bugs. The Von Neumann model is also tricky to work with on paper; the program state includes 64 kibibytes of memory that the virtual machine may access arbitrarily. A human computer would have to simulate "paging" and try to only keep memory that has been used in the computation on paper.
We didn't use 8-bit instructions in our program, but the uxn virtual machine also allows for using 8-bit instructions and data. Incorrectly "typed" instructions can also lead to confusing behaviour, producing an error somewhere in the program far detached from where the programming error is. Combining 8-bit and 16-bit instructions which operate the stack leads to further confusion, as the individual bytes of a 16-bit integer can be manipulated separately; i.e. a 16-bit integer is not an "atomic" object, and the only atomic objects are 8-bit integers. Rearranging the stack with a mixture of 8-bit and 16-bit data is also more difficult, and requires thinking about a 16-bit value as two 8-bit values. Thus even the abstraction of 16-bit integers can be defeated, and the reader must continue reasoning at the level of 8-bit integers, for their reasoning to be sound.
The lack of redundancy also makes life more complicated for a translating implementation of uxn, as an implementation must preserve the semantics of every action performed by the virtual machine, including those which would typically trip type and bounds checks in other programming languages. Some have argued that a lack of redundancy (which is somewhat incorrectly called "untyped" in the article) allows for a simpler implementation, and further that a dynamic translator for such a machine would be simpler, but the latter claim is far from reality. One example is that, with a Von Neumann machine such as uxn, where data and code reside in the same memory, any memory-write instruction may modify code, which would require the implementation to invalidate any machine code that was translated from the bytecode. It could be possible to ignore effects to machine code, but uxn programmers do appear to rely on self-modifying code for some purposes, so a uxn implementation does have to preserve the specified semantics to be useful.
Modern compilers use alias analysis to determine which references can reference the same memory; often this analysis is helped by redundancy in the language, such as type information and array bounds, to prove that two references cannot alias. We could use an alias analysis to prove that writes to memory cannot affect bytecode that has been executed and translated. However, the uxn language has no types, nor any sort of notion of a reference which isn't just an index into memory, so alias analysis can only be performed using low-level information and more complex reasoning. The aforementioned 32-bit arithmetic library writes into memory to store its arguments, though the locations are constant and so alias analysis is not difficult.28 Writing to any location that is not constant would require very general and very precise analysis. For example, even writing to an element of an array at a constant location might require determining a tight range for the index written to; whereas a language with bounds checks can assume that such a write cannot affect any other memory, as out-of-bounds access instead signals an error.29 The problem of alias analysis, and thus the problem of analysing if code can or can't self-modify, is much more difficult without such redundancy, despite the lack of redundancy being simpler for an implementation which does not attempt to translate bytecode.
It appears that only a subset of uxn seems useful for simulation by humans, which is no different to any other low-level language. The uxn platform also offers no safety mechanisms, including bounds checks inside the virtual machine. The same lack of redundancy exposed to the virtual machine may make for a simpler naive interpreter, but an efficient implementation of uxn has to either be prepared to de-optimize almost anywhere, or to use unnecessarily complicated and precise analyses to prove that optimisations still preserve the semantics of the machine.
Systems
The low-level view of the world also raises questions to how programs might interact with the world.
The essentially untyped30 instruction set and memory also are an issue for security, and hamper reasoning about programs with erroneous inputs. For example, C programs compiled to WebAssembly may have even fewer safety checks than C programs compiled to native code.31 However, security flaws in programs compiled to WebAssembly may not be as disastrous as, say, security flaws in programs compiled to native code, because the WebAssembly implementation acts as a sandbox, and no information outside the virtual memory of the implementation can be compromised. The uxn system could also be a sandbox, and prevent damage outside of its memory, but a filesystem device is specified for the platform, which could cause damage.32 The device is specified to not allow an emulator to escape its working directory, but a user must take care in how they organise their filesystem in order to benefit from such sandboxing. If a user is unorganised and does not give each application its own directory, the user may face data loss due to a misbehaving application.
Another issue is the reliance on a monolithic process/application model, where only one program resides in an emulator. This may well be the only kind that works for such a low-level system, as the address space might be too small to contain many programs. Programs use absolute addresses to refer to memory, so an virtual machine-level concept of virtual memory might be necessary to allow each program to use its address space however it sees fit. Some degree of cooperation from the host operating system is required in order to compose programs then; a Unix-esque pipeline requires the operating system to run multiple emulators concurrently, or to use temporary files and run multiple emulators sequentially. In either case, the virtual machine is not self-sufficient, and requires non-portable behaviour from the host operating system to implement even these simple forms of program composition.
The use of a bytecode as distribution format also loses symbolic information which may be useful while trying to understand a program. Someone who only has the binary image of a program cannot view symbolic names or comments used by the programmer, as these do not exist in the bytecode form of the program. While the bytecode format may be very compact, "programs must be written for people to read, and only incidentally for machines to execute" and the only action made easy by preserving bytecode is to execute the program. Trying to study or modify such a program requires substantial reverse engineering effort.
More advanced and flexible sorts of composition and introspection, leading to a programming system rather than just a language and machine, are also hindered by symbolic information not being readily available at runtime. This further is compounded by the language not having any concept of data structures or any higher-level concepts than raw memory, so even providing all the symbolic information related to a uxn program would not allow for evolving the state of memory.
Conclusion
The design of uxn makes it unsuitable for personal computing, be it on new or old hardware, or pen-and-paper simulations. Some people think these constraints are fun. We can't argue with personal preferences, but we can still argue that these constraints coincide with weaknesses of both humans and computers. The machine model is hard for humans to simulate, and programs using uxn are as unsafe and insecure as programs written in mainstream contemporary low-level languages. The only simple implementations of uxn are quite slow, which is undesirable, especially on older machines which are slow already; a performant implementation of uxn requires much of the complexity of modern optimising compilers. Yet even a simple implementation must be compiled using a C compiler on a mainstream computing platform, so we can only conclude that either implementation simplicity (of the toolchain producing the virtual machine) is not important, or we have not meaningfully escaped mainstream computing platforms.
These issues relate to optimising the virtual machine design to make the simplest possible implementation very simple, rather than making an "appropriate" implementation relatively simple. Having better performance and security measures would allow the uxn machine to be used for more tasks, and to be useful more generally in a personal computing system. There are techniques to produce reasonably performant and reasonably simple virtual machine implementations, such as the dynamic translation we proposed, and these sorts of implementations should not have to suffer due to design choices that benefit simple interpreters. The low-level Von Neumann architecture does not interact well with human reasoning, but that sort of architecture is more fundamental to the virtual machine design, so we cannot offer a simple solution.
At the very least, the reader should be reminded that with the claims to environmentalism and permacomputing and all, the pragmatics are more important than the aesthetics. The energy company doesn't pollute any more or less if one wastes cycles to conventional "bloated" software, or to an inefficient implementation of someone's aesthetic visions. While it may seem that these issues are obvious for such low-level architectures, and so we should expect no better, the choice in architecture was an intentional one, and not at all incidental or necessary.
Conclusion
How did this all happen then? A quick detour is necessary for context. Murray Bookchin names a concept of futurism while discussing predictions of the future, where things just seem to get larger, while staying the same otherwise.
What is futurism? Futurism is the present as it exists today, projected, one hundred years from now. That’s what futurism is. If you have a population of X billions of people, how are you going to have food, how are you going to do this… nothing has changed. All they do is they make everything either bigger, or they change the size—you’ll live in thirty story buildings, you’ll live in sixty-story buildings. Frank Lloyd Wright was going to build an office building that was one mile high. That was futurism.33
This sort of futurism is characterised by growing structures, without considering how and if they can be grown. The proportions of things are merely made larger, regardless of if they can be made larger without some other sort of failure. Alan Kay also brings up a similar issue with software design, but doesn't name it as such:
Now, somebody could come along and look at a dog house and say, "Wow! If we could just expand that by a factor of a hundred we could make ourselves a cathedral." It's about three feet high. That would give us something thirty stories high, and that would be really impressive. We could get a lot of people in there. [… However,] when you blow something up [by] a factor of a hundred, it gets a factor of hundred weaker in its ability, and in fact, […] it would just collapse into a pile of rubble.34
But whereas futurism is mindless growth, the stories we told involve mindless degrowth. We doubt we can make a physical metaphor as elegant as building a cathedral by attempting to build a large dog house, without any concern for physics, but we can try: this sort of degrowth is akin to taking only one arbitrary machine from a complicated factory, and hoping to do much with it. A single machine plainly does not achieve much when it is not part of a larger production chain. Instead, a drastically different design is necessary to have smaller and decentralised means of production.35
Software suffers the same fate as physical production: writing "simple" software that does not achieve much on its own will does not lead to the user benefiting from simplicity. Simplicity needs to be pervasive through the system, and further in the context it resides in, for one to benefit from it. Invoking that "simple code is easier to reason about" to defend unsafe programming languages rings hollow, when the language used produces many edge cases that either require more complex code to check for, or for all users of the code to carefully avoid those cases. It similarly does not help to use a programming platform which is simple to write a slow implementation for, where there are no forms of redundancy to help detect bugs, where writing secure code is difficult, and executing programs efficiently is no easier (if not harder) than with mainstream platforms.
Minimalist computing is theoretically about "more with less", but rather than being provided with "more", we are instead being guilt-tripped and told that any "more" is sinful: that it is going to cause the collapse of civilisation, that it is going to ruin the environment, that it increases the labour required by programmers, and so on. Yet it is precisely those minimalist devices which are committing these sins right now; the hypocritical Church of Minimalism calls us the sinners, while it harbours its most sinful priests, and gives them a promotion every so often.
Such is life, we suppose.
Acknowledgements
We would like to thank Gilbert Baumann, Paweł Lasek and Robert Strandh for helping us with our computer history homework. We would also like to thank a friend who dances in the shallows of a river, Nyx Land and Jan Moringen for reminding us of how to write English.
Footnotes:
Charles Antony Richard Hoare, The emperor's old clothes https://dl.acm.org/doi/10.1145/358549.358561
"Dynamic [languages] and GC [garbage collection] are actually features […] I think it's fine for people who are not professional programmers […] who are not making system software, and probably not interactive software." – Jon Blow, known for using an interactive editor programmed in a dynamic language with garbage collection, in a Twitch stream https://clips.twitch.tv/ResoluteKawaiiShingleDancingBanana--dErqidMyG6ToNQi
2021 CWE Top 25 Most Dangerous Software Weaknesses http://cwe.mitre.org/top25/archive/2021/2021_cwe_top25.html
Henry G. Baker Jr, Factoring Redundancy https://plover.com/~mjd/misc/hbaker-archive/letters/CACM-FactoringRedundancy.html
The "dynamic" checks in bounds checking can be reworded into a "static" assertion of program behaviour of sorts; a program with mandatory bounds checks cannot produce out-of-bounds access by construction, as it is impossible to write a program that uses unchecked accesses. But static analysis is usually used to prove the stronger assertion that bounds checks will never fail.
Joe Armstrong, Making reliable distributed systems in the presence of software errors https://erlang.org/download/armstrong_thesis_2003.pdf
Peter H. Hochschild, Paul Turner, Jeffrey C. Mogul, Rama Govindaraju, Parthasarathy Ranganathan, David E. Culler, Amin Vahdat, Cores that don't count https://sigops.org/s/conferences/hotos/2021/papers/hotos21-s01-hochschild.pdf
June Andronick, David Greenaway, Gerwin Klein and Japheth Lim, Don't Sweat the Small Stuff: Formal Verification of C Code Without The Pain https://trustworthy.systems/publications/nicta_full_text/7629.pdf
Henry G. Baker Jr, Dubious Achievement https://plover.com/~mjd/misc/hbaker-archive/letters/CACM-DubiousAchievement.html
L Peter Deutsch and Allan M. Schiffman, Efficient Implementation of the Smalltalk-80 System http://web.cs.ucla.edu/~palsberg/course/cs232/papers/DeutschSchiffman-popl84.pdf
Craig Chambers, Urs Hölzle and David Ungar, Optimizing Dynamically-Typed Object-Oriented Languages With Polymorphic Inline Caches https://bibliography.selflanguage.org/_static/pics.pdf
Craig Chambers and David Ungar, Iterative Type Analysis and Extended Message Splitting: Optimizing Dynamically-Typed Object-Oriented Programs https://bibliography.selflanguage.org/_static/iterative-type-analysis.pdf
Urs Hölzle and David Ungar, Optimizing Dynamically-Dispatched Calls with Run-Time Type Feedback https://bibliography.selflanguage.org/type-feedback.html
Robert Bernecky, The Role of APL and J in High-performance Computation https://snakeisland.com/aplhiperf.pdf
Joe Armstrong on the erlang-questions mailing list https://erlang.org/pipermail/erlang-questions/2017-September/093532.html
The performance-per-watt of a sufficiently old computer can be too low to justify its use. However, the embodied energy used to produce a computer is quite large. We don't know of a good model for where the "crossover point" is, which is where the energy inefficiency would exceed the embodied energy somehow; but the conventional wisdom from people we talk with is that the crossover point is around 10 to 15 years.
Michal Jirků, Urbit: the good, the bad, and the insane https://wejn.org/2021/02/urbit-good-bad-insane/
Stanislav Datskovskiy, Of Decaying Urbits http://www.loper-os.org/?p=1390
Arndt B. Bergh, Keith Keilman, Daniel J. Magenheimer, and James A. Miller, HP 3000 Emulation on HP Precision Architecture Computers https://www.hpl.hp.com/hpjournal/pdfs/IssuePDFs/1987-12.pdf pages 87–89. You had best get a cup of tea while waiting for it to download; the file is a fairly high quality scan of the full journal, with all 96 pages, and downloading is slow.
Richard L. Sites, Anton Chernoff, Matthew B. Kirk, Maurice P. Marks, and Scott G. Robinson, Binary Translation https://dl.acm.org/doi/10.1145/151220.151227
Ronald L. Johnston, The Dynamic Incremental Compiler of APL\3000 http://www.softwarepreservation.org/projects/apl/Papers/DYNAMICINCREMENTAL
The documentation for GNU libjit describes how to produce a data-flow graph from a simple stack machine.
It is possible that modern computers are relatively worse at running interpreters, as a processor has to branch many times when an interpreter interprets otherwise "branch-free" code, and this behaviour performs worse on processors with pipelined and superscalar execution. But we believe the slowdown due to interpretation is still quite bad, even on older processors.
Specifically, we refer to addition, subtraction, bitwise operations, and comparisons. Multiplication and division do tend to be slower for larger register sizes, but are less common and thus they affect performance less. Documentation such as Agner Fog's instruction tables for x86 processors provides the specific latencies of these operations; the former set of instructions have latencies independent of the word size.
Henry G. Baker Jr, Efficient Implementation of Bit-vector Operations in Common Lisp https://plover.com/~mjd/misc/hbaker-archive/Bitvectors.html
Marshall Lochbaum, Moving Bits Faster in Dyalog 16.0, Sub-nanosecond Searches Using Vector Instructions, and Implementing Reduction. We couldn't pick a favourite presentation, so we picked three.
Henry G. Baker Jr, Metacircular Semantics for Common Lisp Special Forms https://plover.com/%7Emjd/misc/hbaker-archive/MetaCircular.html
Though we cannot eliminate the memory write and keep the temporary values in registers, as we cannot prove that the location will never be read again. Also note that our hand-translation of the 16-bit code is still sound, as the program translated does not touch primary memory at all.
C compilers can still make some assumptions, as writing out of bounds is undefined behaviour, so an implementation does not have to act as if it really wrote out of bounds. But the behaviour of C is also unfortunate, for previously described reasons.
It is common for type theorists to call dynamically typed language "untyped", but we mean specifically that there is no means of detecting ill-typed instructions whatsoever, whether it be through static analysis or runtime checks.
Quentin Stiévenart, Coen De Roover, and Mohammad Ghafari, Security Risks of Porting C Programs to WebAssembly https://arxiv.org/abs/2112.11745
The specification was changed since we first published this article in June; the filesystem device at that time could manipulate any file, but this is no longer possible. The uxn32 implementation could also disallow filesystem access, but this kind of sandboxing needs to be standardised to be effective; moreso as ease of implementation is an espoused benefit of the uxn design, so it is not reasonable to expect implementation-specific behaviour to be common.
Murray Bookchin, Why doing the impossible is the most rational thing we can do http://unevenearth.org/2019/10/bookchin_doing_the_impossible/
Alan Kay, The computer revolution hasn't happened yet https://www.youtube.com/watch?v=oKg1hTOQXoY
Kevin Carson, The Homebrew Industrial Revolution: A Low-Overhead Manifesto http://apw.org.nz/wp-content/uploads/2015/06/HomeBrewRevolution_Carson.pdf