Some make the claim that "all languages depend on C";1 not only is this statement ill-typed, as implementations perhaps depend on other languages, and languages rarely do, but the statement would still be wrong if we correct it. Some will retort by writing parts of an implementation in another "system language"; more recently a Lisp-like system language has been proposed (again). But it is very possible to create a working implementation without a systems language, and we think it is better than the alternative.
We intend to write an implementation of the Utena programming system, which is based on the Newspeak object model and Lisp evaluation semantics, using the proposed techniques. The language also inherits an object-capability model from Newspeak, which we believe is very beneficial for a language for writing language implementations; a "systems" language, as it stands, is a collection of things that don't fit into a modular language. There shouldn't be one.
Why it would be desirable
We believe an implementation written entirely in the language it implements has some engineering benefits. The first is that a programmer does not need to know any other language, in order to modify any part of their implementation.
Implementations which are written in another language and use an interpreter also have a very large difference in performance between code written in the language used for the implementation, and the language being implemented. A common approach to improving the performance of the implementation is to write more modules in the host language; for example, dictionaries and sets in CPython are written in C, rather than Python. Even if a programmer has identified a data structure with better performance in theory2, finding a performance improvement may be very difficult, because the performance of individual operations is worse, regardless of the complexity of particular algorithms.
This form of optimisation can also lead to security bugs: errors in the implemented language are checked and handled, whereas the implementation language may be unsafe by default. The forementioned performance hacks would require more code to be written in the implementation language, exacerbating the problem and leading to more ways the code can fail, and less of the implementation being approachable to a programmer who is not familiar with the host language.
When it is necessary to use unsafe operations, we may introduce them to a high-level and safe language in a gradual manner, and all other operations can remain safe. The Jikes RVM, a Java virtual machine implemented in Java, includes "magic" operations for this purpose. Newer "system" languages, such as Rust, also allow for marking code as "unsafe", with any other code being safe, but we can provide very fine-grained control over magic using the object-capability model.
The object-capability model also allows for replacing magic operations with simulations. If an implementation is implemented in two languages (say, Lisp and C), some information invariably gets duplicated between Lisp code and C code. The genesis stage of bootstrapping SBCL manages this duplication brilliantly; some Lisp code run during bootstrapping generates C headers describing constants and structures, so that constants and object layouts only need to be defined once in Lisp. But more complex algorithms cannot be generated in the same way. As SBCL is designed to be bootstrapped from any Common Lisp implementation, genesis is able to dump out a "native" image from just Common Lisp, which requires a Lisp implementation of the heap allocator to be used during bootstrapping, whereas the allocator in the C runtime is used after bootstrapping. The allocator implemented in genesis does differ to the C allocator, as the former is designed to pack more objects into pages than the latter, but essentially the same algorithm is implemented twice.
As all modules are implicitly parameterised and all dependencies can be replaced transparently, making the allocator manipulate a simulated heap rather than a real heap just requires instantiating the allocator with a module that affects a simulated memory, rather than a module that affects raw memory. Other parts of the runtime could be tested in a similar way, by having operations on raw memory instead modify a simulated memory, and these parts could be debugged from the comfort of the usual debugging tools used by the user, rather than a separate debugger like GDB, or a "low-level" debugger like ldb in SBCL.
This technique is used in the Jikes RVM: the memory management can be "rehosted" to target a simulated heap, by using a different implementation of the memory interface. However, the technique is used for debugging rather than bootstrapping (which, granted, isn't too important) and replacing modules can be performed transparently with the module system inherited from Newspeak, whereas it cannot with the Java module system. The implementation of Squeak can also be debugged inside a Smalltalk environment, because the implementation is written in a subset of Smalltalk (which can "conveniently" also be translated to C with little effort), and thus is a perfectly fine Smalltalk program.
How to achieve it
There are at least three problems that could cause such implementation endeavours to go wrong. Fortunately, there are solutions that are not too complex, which are interesting in that they are already nice to have for improving the performance of user code, and don't require too much implementation complexity.
The first problem is that, obviously, we must have a compiler, as we cannot produce a self-hosted implementation with only interpreters. A processor has to run someone's machine code. But the compiler does not have to be too sophisticated; it just has to emit something for the processor to interpret. The compiler can be similar to the compiler we proposed in I don't want to go to Chel-C, which merely replaces each bytecode instruction with a set of native instructions.
Implementing method lookup
A more interesting problem is that, if almost every construct in the language is a message send, and all of the implementation is written in the language, handling a message send may cause infinite recursion. The Art of the Metaobject Protocol addresses several similar issues in the Common Lisp Object System and the proposed meta-object protocol, in the appendix "Living with Circularity": for example, attempting to access a slot of an object requires looking up the appropriate slot definition in the class of the object, so that the location of the slot in the object may be determined. The issue is that a slot definition is also an object, and information in slot definitions is also stored in slots, thus the same lookup is required in order to retrieve information from the slot definition, causing infinite recursion. The appendix suggests hard-coding tests for base cases to break the infinite recursion, which will preserve the semantics of the object system, but is somewhat displeasing to read; and more importantly, it cannot be implemented in a language where all control flow is also comprised of message sends.
The compiler could inline enough message sends to allow the base case to be checked without performing another message send, but this would require a fairly complex compiler. Robert Strandh proposed the technique of satiation to allow the implementation to automatically compute the base case. The technique involves installing the methods which are required to break infinite recursion into caches, so that no further computation needs to be done while calling the associated generic functions. The use of inline caches can improve the performance of object-oriented languages, by caching the method which should be used when a specific class of the receiver is observed. Similar to Strandh's technique, we could install the methods for various base cases into inline caches, avoiding infinite recursion.
Writing a garbage collector
A similar infinite regress problem is implementing a garbage collector in a language which typically requires a garbage collector to run. The implementation could similarly use escape analysis to avoid allocating to the heap, but the analysis could be complicated, and would likely not achieve much without sufficient inlining or inter-method analysis.
Another solution is to use the lazy allocation model proposed by Henry Baker. The implementation would attempt to allocate all objects on a stack, but moves objects to the heap when an assignment is performed, which would cause older stack frames to reference newer stack frames, or the heap to reference the stack. As all objects in the heap are older than stack frames used by the garbage collector, it should be possible to implement a garbage collector which does not cause any heap allocation, and any garbage generated during collection can be collected by stack allocation. This is most important when control flow is expressed with message sends and closures; closures would be garbage, but the amount of garbage is constant if the callee frame on the stack used for allocation is popped just before performing another loop iteration. Resetting the allocation stack after every loop iteration has been proposed for Java, but as all loops are implemented using tail-recursion3 in Utena, it suffices to pop the stack just before performing a tail-call.
As mentioned prior, a subsequent problem is that it may be useful to prove that the garbage collector cannot ever cause heap allocation by accident. This may be ensured by using another language without garbage collection, but it may also be ensured using a separate static analysis tool for the same language. The latter approach is used in the Go runtime, which has a garbage collector written in Go. The compiler can be made to log when code for heap allocation is generated, potentially proving the collector does not perform heap allocation. We could implement this analysis in a separate program, similar to Gilad Bracha's conception of pluggable types, but this analysis would not be different to an escape analysis pass in a compiler, reducing the appeal of lazy allocation as an implementation technique. But the proof is not necessary to bootstrap a working implementation, still,4 and lazy allocation can produce less conservative results than escape analysis, as the former reacts to what actually happens at runtime, whereas the latter has to consider everything that could happen ahead of time.
We have proposed some reasons and techniques for implementing a high-level language in itself. It is interesting that these techniques are not new, and would generally appear to be good ideas for performance, as well as being useful for bootstrapping.
More recently the mysticism of what a "systems" language is has been demarcated to a definition that it should be "low-level enough to bootstrap a runtime". The conceptual question "What is a systems language?" has then changed into the question "What systems?" With "systems language" one looks for the concept, with "systems" there is no longer any question at all; the question answers itself.
An operating system, or some other "bare metal" code could be produced using similar techniques, with a high-level language. Thus "obvious" statements like "an operating system cannot be written in Python" are much less obvious.
There is already the sense that the "theoretical" big-O complexity of algorithms may be less relevant, when the "constant factors" of performance of operations vary substantially, and the size of data is bounded to some extent. But we mean specifically that interpretation substantially slows down every operation, violating even the model the programmer has of constant factors associated with operations, be it of latency or processor instructions or something else.
(defun %while (test body) (cond ((funcall test) (funcall body) (%while test body)) (t nil))) (defmacro while (test &body body) `(%while (lambda () ,test) (lambda () ,@body)))