Challenge: Write a computer program that, given an input string, produces a short Brainfuck program that prints this string. The shorter the Brainfuck programs are on average for some total set of strings, the better.
Brainfuck is the esoteric programming language with only 8 instructions that remarks itself by having a very small compiler, and by resembling a classic, mechanical interpretation of the Turing machine, like Mike Davey’s A Turing Machine (2010):
The Kolmogorov complexity of a piece of information is measured by the length of the shortest computer program (in a predetermined language) that produces the information as its output. Brainfuck is not particularly good at compressing short strings. But due to the the Invariance Theorem, there is at most a constant overhead between this and the best language.
Brainfuck isn’t particularly useful, and it is difficult to write useful programs in. One thing it lends itself well to is to write programs that print out short pieces of text, like names. The following is a program that prints “Simon”:
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +++++++>+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ ++++++++++++++++++++++++++++++++++++>++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++>+++++++ +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++>+++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++[<]>[.>]
It does so by constructing the memory array below, and then it loops back to the start and prints each character until the first memory cell is empty. Here,
[<] loops back and
[.>] loops forward, printing each character one at a time.
|0||83 (S)||105 (i)||109 (m)||111 (o)||110 (n)||0||0||0||…|
It is not strictly necessary to have the entire name in memory in order to print each letter individually, but for the sake of the challenge, let us assume that both of these goals must be met.
There are many other programs that only just print “Simon”. That is, many other programs that behave equivalently to the one above, even though they may look different. For example, instead of writing 83 plusses to store the number 83 in the second cell, one could use the loop construct and save a number of bytes:
+++++++++[>+++++++++<-]>++. This says: Nine times, move one register ahead and increment by nine, then go back and decrement by one. At the end, increment by two more. If the same technique is used for all letters, a program that prints “Simon” might also look like:
+++++++++[>+++++++++<-]>++< cell 2: nine times nine plus two or 'S' ++++++++++[>>++++++++++<<-]>>+++++<< cell 3: ten times ten plus five or 'i' ++++++++++[>>>+++++++++++<<<-]>>>-<<< cell 4: ten times eleven minus one or 'm' ++++++++++[>>>+++++++++++<<<-]>>>+<<< cell 5: ten times eleven plus one or 'o' ++++++++++[>>>+++++++++++<<<-] cell 6: ten times eleven or 'n' >[.>]
This is a somewhat smaller program, but it can be improved even further. For example, ‘m’, ‘o’ and ‘n’ are almost identically constructed, yet we waste space on constructing them individually. Let’s fix that:
+++++++++[>+++++++++<-]>++> cell 2: nine times nine plus two or 'S' ++++++++++[<<+++++++++++>>-] cell 1: ten times eleven [>>+>+>+>+<<<<-] cell 1: zero and cells 3 4 5 6: ten times eleven >>----->->+ cells 3 4 5 6: 105 109 111 110 [<]>[.>]
To give an idea of what happens here, the register values are shown updated once for each line:
From the experiments above, a strategy can be formed:
Here is how this strategy could look like in Haskell:
module Main where import Data.Char import Data.List import System.Environment -- Compile using: ghc BFKolmogorov -o bfk -- Run using: ./bfk Simon main :: IO () main = do args <- getArgs case args of [str] -> putStrLn $ makeBF str _ -> putStrLn $ "Use ./bfk <str>" makeBF :: String -> String makeBF str = charCode ++ replicateCode ++ adjustCode where charCode = makeChar (chr avg) replicateCode = "[" ++ (">+" *** len) ++ "[<]>-]" adjustCode = concatMap (\n -> ">" ++ (diff avg n)) ints ++ "[<]>" len = length str ints = map ord str avg = roundedAvg ints -- Creates the BF code for generating character code `c` in the register ahead. -- Assumes that the current register is empty so it can be used for looping. If -- it's not, `makechar` will produce a too large character code. makeChar :: Char -> String makeChar c = incr ++ "[>" ++ incr ++ "<-]>" ++ rest where n = ord c sq = truncate (sqrt $ fromIntegral n) incr = "+" *** sq rest = "+" *** (n - sq*sq) -- Calculates the average of a list of Ints, rounded down roundedAvg :: [Int] -> Int roundedAvg ns = if length ns == 0 then 0 else sum ns `div` length ns -- `"foo" *** 3 == "foofoofoo" (***) :: String -> Int -> String s *** n = concat (replicate n s) diff :: Int -> Int -> String diff from to = replicate (abs n) c where n = from - to c = if n > 0 then '-' else '+'
Running this program, it can be verified that its output is comparable to the strategy above:
$ ./bfk Simon ++++++++++[>++++++++++<-]>+++[>+>+>+>+>+[<]>-]>-------------------->++>++++++>++++++++>+++++++[<]>