Brainfuck and Kolmogorov Complexity

Challenge: Write a computer program that, given an input string, produces a short Brainfuck program that prints this string. The shorter the Brainfuck programs are on average for some total set of strings, the better.

Brainfuck is the esoteric programming language with only 8 instructions that remarks itself by having a very small compiler, and by resembling a classic, mechanical interpretation of the Turing machine, like Mike Davey’s A Turing Machine (2010):

The Kolmogorov complexity of a piece of information is measured by the length of the shortest computer program (in a predetermined language) that produces the information as its output. Brainfuck is not particularly good at compressing short strings. But due to the the Invariance Theorem, there is at most a constant overhead between this and the best language.

Brainfuck isn’t particularly useful, and it is difficult to write useful programs in. One thing it lends itself well to is to write programs that print out short pieces of text, like names. The following is a program that prints “Simon”:

>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++++++>+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++>++++++++++++++++++++++++++++++++++++++++
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++>+++++++
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++++++++++++++++++++++++++>+++++++++++++++++++++++++++++++++++++++++++++++++
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++[<]>[.>]

It does so by constructing the memory array below, and then it loops back to the start and prints each character until the first memory cell is empty. Here, [<] loops back and [.>] loops forward, printing each character one at a time.

0 83 (S) 105 (i) 109 (m) 111 (o) 110 (n) 0 0 0

It is not strictly necessary to have the entire name in memory in order to print each letter individually, but for the sake of the challenge, let us assume that both of these goals must be met.

There are many other programs that only just print “Simon”. That is, many other programs that behave equivalently to the one above, even though they may look different. For example, instead of writing 83 plusses to store the number 83 in the second cell, one could use the loop construct and save a number of bytes: +++++++++[>+++++++++<-]>++. This says: Nine times, move one register ahead and increment by nine, then go back and decrement by one. At the end, increment by two more. If the same technique is used for all letters, a program that prints “Simon” might also look like:

+++++++++[>+++++++++<-]>++<              cell 2: nine times nine plus two     or 'S'
++++++++++[>>++++++++++<<-]>>+++++<<     cell 3: ten times ten plus five      or 'i'
++++++++++[>>>+++++++++++<<<-]>>>-<<<    cell 4: ten times eleven minus one   or 'm'
++++++++++[>>>+++++++++++<<<-]>>>+<<<    cell 5: ten times eleven plus one    or 'o'
++++++++++[>>>+++++++++++<<<-]           cell 6: ten times eleven             or 'n'
>[.>]

This is a somewhat smaller program, but it can be improved even further. For example, ‘m’, ‘o’ and ‘n’ are almost identically constructed, yet we waste space on constructing them individually. Let’s fix that:

+++++++++[>+++++++++<-]>++>              cell 2: nine times nine plus two     or 'S'
++++++++++[<<+++++++++++>>-]             cell 1: ten times eleven
[>>+>+>+>+<<<<-]                         cell 1: zero and cells 3 4 5 6: ten times eleven
>>----->->+                              cells 3 4 5 6: 105 109 111 110
[<]>[.>]

To give an idea of what happens here, the register values are shown updated once for each line:

Line 1: 0 83 0 0 0 0
Line 2: 110 83 0 0 0 0
Line 3: 0 83 110 110 110 110
Line 4: 0 83 105 109 111 110

A strategy: Replicate average value

From the experiments above, a strategy can be formed:

  1. Find the average numeric value of all letters in the input name
  2. Generate that value using some multiplication
  3. Replicate the value across as many cells as necessary
  4. Adjust each value a little bit up or down to fit each letter exactly
  5. Print this

Here is how this strategy could look like in Haskell:

module Main where

import Data.Char
import Data.List
import System.Environment

-- Compile using: ghc BFKolmogorov -o bfk
-- Run using: ./bfk Simon
main :: IO ()
main = do
  args <- getArgs
  case args of
    [str] -> putStrLn $ makeBF str
    _     -> putStrLn $ "Use ./bfk <str>"

makeBF :: String -> String
makeBF str =
  charCode ++ replicateCode ++ adjustCode
  where
    charCode = makeChar (chr avg)
    replicateCode = "[" ++ (">+" *** len) ++ "[<]>-]"
    adjustCode = concatMap (\n -> ">" ++ (diff avg n)) ints ++ "[<]>"
    len = length str
    ints = map ord str
    avg = roundedAvg ints

-- Creates the BF code for generating character code `c` in the register ahead.
-- Assumes that the current register is empty so it can be used for looping.  If
-- it's not, `makechar` will produce a too large character code.
makeChar :: Char -> String
makeChar c =
  incr ++ "[>" ++ incr ++ "<-]>" ++ rest
  where
    n = ord c
    sq = truncate (sqrt $ fromIntegral n)
    incr = "+" *** sq
    rest = "+" *** (n - sq*sq)

-- Calculates the average of a list of Ints, rounded down
roundedAvg :: [Int] -> Int
roundedAvg ns =
  if length ns == 0 then 0 else sum ns `div` length ns

-- `"foo" *** 3 == "foofoofoo"
(***) :: String -> Int -> String
s *** n = concat (replicate n s)

diff :: Int -> Int -> String
diff from to = replicate (abs n) c
  where
    n = from - to
    c = if n > 0 then '-' else '+'

Running this program, it can be verified that its output is comparable to the strategy above:

$ ./bfk Simon
++++++++++[>++++++++++<-]>+++[>+>+>+>+>+[<]>-]>-------------------->++>++++++>++++++++>+++++++[<]>