But who is compiling the compiler?
A compiler is a piece of software that takes in a program written in one high-level language and translates it into another language (usually machine code). A compiler itself is also usually written in a high-level language (due to its complexity), so who is compiling the compiler?
If you check some popular compiled languages, you will find they are usually written in themselves (e.g., gcc
is written in C, rustc
is written in Rust, etc.). This is called self-hosting and it is often regarded as a hallmark of a mature programming language.
That sounds like a chicken-or-the-egg
situation. To be clear about it, we should be aware that software is an evolving thing and it has versions. To put it precisely, a self-hosted compiler is compiled by a previous version of itself: \(V_n\) is compiled by \(V_{m}\) where \(m < n\) (usually \(m = n- 1\)). This has some interesting implications:
- The newer version can only use older features to implement new features.
- If there is a bug in the older version, it should be fixed in the newer version without triggering the old bug. (Can we introduce a bug into a compiler that is impossible to fix without using that feature?)
- The compiler can be backdoored in an intricate way(see the
Reflections on Trusting Trust
paper in the further reading).
But there is still one caveat: How about \(V_0\), the very birth of the compiler? That’s when we have to seek help from another language, the initial version of rustc
is written in OCaml for example. This also implies that the very first compiler in history was written in assembly, and the very first assembler was written in 0
s and 1
s.
Further reading:
- An in-depth introduction on bootstrapping in rustc
- Backdoor a compiler: Reflections on Trusting Trust