There is no first principle

有始也者，有未始有始也者，有未始有夫未始有始也者。
—— 庄子

First principle thinking is a popular concept recently, but actually it is a very old one. Euclidean geometry is a perfect example of first principle thinking. In Euclidean geometry, all the theorems are derived from five axioms. That’s how first principle thinking works. We start with undeniable facts and build on top of them to derive new truths.

Why are those five axioms undeniable? Because they are so fundamental and obvious that we cannot even imagine a world without them. But can’t we?

In the nineteenth century, mathematicians began to seriously question that assumption. At first they tried to use only four axioms and derive the fifth but failed. Then they started to question what if the fifth postulate—the parallel postulate—were not self-evident after all? They did invent many other geometries that start with different assumptions, one of which is called Riemannian Geometry and it is the foundation of general relativity.

Actually as human beings, we have never really known the first principle. We are always questioning could there be a principle before the ‘first principle’, and is there an alternative ‘first principle’?

So here is what it really means to think in first principle:

We start with a set of assumed fundamental principles, and start building on top of them. We choose them because they are to the best of our knowledge the most fundamental and undeniable facts, and they work best in the current context.
In the meantime, we should be aware that the principles we assumed may change, and we should think how we might change them to make them closer to our context.

The most important lessons I learned from computer science are abstraction and type. Basically, you can treat everything as a function or box that takes in some inputs and produces some outputs. Since multiple items can be grouped into a single tuple, the most concise way to represent the idea is:

\[f(Input: T_1) \rightarrow Output: T_2\]

Here \(T_1\) and \(T_2\) are the types of the input and output, respectively. At a high level, the whole computer is taking in some bit stream (T1), and producing some other bit stream (T2). Why that is all about it? Because we can assign meanings to the bit streams (i.e., 0 and 1). We can use bits to encode numbers, and once we have numbers everything else becomes easy (e.g., pictures, audio, languages, etc.). Basically for anything to be processed by computers, we just need to come up with a encode function that \(encode(Input: what\_ever\_you\_want) \rightarrow Output: numbers\) and since we already have an encoder from numbers to bit streams the computer is able to process it. An important idea here is that we can also encode the instructions as data (i.e., numbers) and that’s the basic idea of Universal Turning Machine.

Enough about computers, let’s back to first principles. Abstraction provides a good framework for us to think in first principles. Given our goal of transforming \(T_1\) to \(T_2\), we can abstract away the details beyond as black box (i.e., we do not care where does \(T_1\) come from and where does \(T_2\) go to) and they are just given as known. We only need to care about the white box that transforms \(T_1\) to \(T_2\), and make it work correctly and efficiently. Everyone works on the different levels of abstraction, and the boundary of the abstraction is basically the first principle they are given. When we are doing different things, the boundary is constantly changing and we should be aware of it and adjust our abstraction model. Hoogle is a interesting search engine that let you search for functions by their type signatures (i.e., abstraction boundaries).

Last question, why are LLMs so powerful? It is because they the closest thing we ever built to deal with \(ANY \rightarrow ANY\).