I have a program for you. This is its source code:
It consists of a multi-line string that also seems to contain code, and a really long line of code on the bottom.
If your squint your eyes a bit, you can see that the multi-line string reads “Life Imitates Art”.
Also, when looking closely, some parts of the line of code at the bottom reappear in the string.
So, what could this program do? Let us compile it and find out.
Woah, the output looks like source code, really similar to the one we just compiled!
The string now displays “Art Imitates Life”.
Ok, let us feed the output of our first program to compiler, and see what kind of output it generates:
Hey, I have seen this one! It’s the source code our first program again. Is it magic?
No, it is a special type of self-reproducing computer program called a Quine.
The relation can be visualized with above (adapted) graphic of the ancient Ouroboros, the self-consuming snake.
In our case, we have two snakes, and eating (compiling and running) the others tail (output) yields an endless loop.
What are Quines?
I first came across the topic of Quines (the programs, not the philosopher) in Douglas R. Hofstadter’s Book: “Gödel, Escher, Bach”.
In it, he introduces it as a concept where a sentence is preceeded by itself inside quotation marks:
“yields falsehood when preceded by its quotation” yields falsehood when preceded by its quotation
In this case, a sentence refers to itself and becomes paradoxical.
Part of the sentence is treated operatively, that is we are reading it as a statement, but the part in quotes is treated as an object the statement is about.
This is really similar to the program above: we also have the same source code twice, once with quotation marks and once without.
This is necessary, because when a program normally runs, it does not know anything about its source code. It exists as a sequence of actions carried out by the instructions
in the source code, but it does not know about them. In the case of compiled languages like C, it might even be impossible to recover the source code from a compiled program.
We can work around that by including the source twice: once as data used for self-reproduction, and once as
a list of instructions. If you are into biological analogies, one could relate the string l to the DNA of an organism and the instructions to its proteins.
The structure of the proteins is encoded in the DNA, and certain proteins, such as the enzyme polymerase,
read from it in order to reproduce the information.
They both have to exist at the same time, otherwise our organism is not viable: without the DNA, the polymerase has nothing to read from, and without polymerase nothing
is there to actually read and replicate the DNA. Similarily, our program needs both the data to output and the instructions to output the data. This is the reason for
the apparant duplication in the initial program: they encode the same information in different ways.
If we take a look at the non-minified version of our program, we have a function that prints our source code string l and wraps it in quotation marks,
that is it passes the data on to the next program. Then, another part reads the string l again, this time with out encoding it:
It looks a bit cryptic through all the pointer arithmetic, but essentially it just outputs our string l and undoes any encoding we previously did.
But what about the ASCII art? Well, it is defined in the array life, which contains a run-length encoding generated by a Haskell program.
Two consecutive numbers x and y specify how many characters are followed by how many spaces respectively.
Since we can’t arbitrarily place spaces in a C-program, we encode all spaces of the source code with a pound sign, which will be decoded into a space as we have seen above.
The function printEntry then spits out the source code characters interleaved with spaces exactly as specified. We also can specify an offset to reuse the spacing data for
the two slogans (Life Imitates Art, Art Imitates Life).
The putchar function which outputs a single character is used quite a lot. It is convenient to pass the ASCII characters as hexadecimal numbers, because if we were to use
quotes, we would need to escape those in the source code string which makes the whole endeavour more complicated.
In order to write a 2-Quine that has two states and reproduces itself every other execution, we define a flag at the beginning whose value is inverted each execution.
This is the code that outputs the flag code:
A wrapper script then strips the code of any unnecessary whitespace and comments and concatenates it into the source-code-string-source-code format. Then we “feed the snake to itself” by compiling and running a few times and check whether the output remains the same:
And that’s basically it, sadly (gladly) there is no magic involved, just a bunch of hacks.
A program was presented that can reproduce its own source code in a two-step process while entertaining the user with some ASCII-Art.
Does this have any practical uses? No, not really. In reality, when a computer program wants to duplicate, it forks itself
with help of the operating system, although this is a rather different kind of reproduction.
This was more about the brain teaser of writing such a program, and let me tell you: most of the time I was as confused when writing the code as you probably are when reading it.
The Wikipedia article on quines is a good starting point.
Dylan Beattie gave a really enticing talk about the The Art of Code that talks about Quines at around the 00:27:00 mark, among other absolutely
mind blowing things.
I would not be surprised if someone has already done a similar 2-Quine, if you know about that please let me know!
You can find the complete source code on Github and as ZIP.