But who is compiling the compiler?

A compiler is a piece of software that takes in a program written in one high-level language and translates it into another language (usually machine code). A compiler itself is also usually written in a high-level language (due to its complexity), so who is compiling the compiler?

If you check some popular compiled languages, you will find they are usually written in themselves (e.g., gcc is written in C, rustc is written in Rust, etc.). This is called self-hosting and it is often regarded as a hallmark of a mature programming language.

That sounds like a chicken-or-the-egg situation. To be clear about it, we should be aware that software is an evolving thing and it has versions. To put it precisely, a self-hosted compiler is compiled by a previous version of itself: \(V_n\) is compiled by \(V_{m}\) where \(m < n\) (usually \(m = n- 1\)). This has some interesting implications:

  • The newer version can only use older features to implement new features.
  • If there is a bug in the older version, it should be fixed in the newer version without triggering the old bug. (Can we introduce a bug into a compiler that is impossible to fix without using that feature?)
  • The compiler can be backdoored in an intricate way(see the Reflections on Trusting Trust paper in the further reading).

But there is still one caveat: How about \(V_0\), the very birth of the compiler? That’s when we have to seek help from another language, the initial version of rustc is written in OCaml for example. This also implies that the very first compiler in history was written in assembly, and the very first assembler was written in 0s and 1s.

Further reading: