'Parse, Don't validate' using ViewPatterns
September 14, 2020
In 2019, Alexis King wrote the Parse, don't validate article that catches some of the essences of type-driven software development and design. Several books were released on this matter, including Alexander Granin's Functional Design and Architecture, Sandy Maguire's Thinking with Types: Type-Level Programming in Haskell, or even Edwin Brady's Type-Driven Development with Idris for a book about the future.
This blog post is about where Alexis' three words got me today.
I refactor a Haskell library that handles Ethereum VM opcodes, and in particular, the part of the library that lets you convert a
Word8 into an
readOp :: Word8 -> ByteString -> Op readOp x _ | x >= 0x80 && x <= 0x8f = OpDup (x - 0x80 + 1) readOp x _ | x >= 0x90 && x <= 0x9f = OpSwap (x - 0x90 + 1) readOp x _ | x >= 0xa0 && x <= 0xa4 = OpLog (x - 0xa0) readOp x xs | x >= 0x60 && x <= 0x7f = let n = x - 0x60 + 1 xs' = BS.take (num n) xs in OpPush (word xs') readOp x _ = case x of 0x00 -> OpStop ... _ -> OpUnknown x
I would like for that function to live in evm-opcodes, but it needs cleaning up:
x >= 0x80 && x <= 0x8fstuff is hard to read, but it also sort of ties together the hex codes and the
OpDupconstructor. If the hex codes were only mentioned here, that'd be fine, but they are littered throughout the codebase, making the interpreter harder to read.
Instead of inventing an
OpUnknownerror case, use
Nothing. One may argue that
OpUnknown xis better represented as
Left x, but the caller already has
The first rewrite looks like this:
isDUP, isSWAP, isLOG, isPUSH :: Word8 -> Bool isDUP b = b >= 0x80 && b <= 0x8f isSWAP b = b >= 0x90 && b <= 0x9f isLOG b = b >= 0xa0 && b <= 0xa4 isPUSH b = b >= 0x60 && b <= 0x7f -- | Parse an 'Opcode' from a 'Word8'. In case of 'PUSH' instructions, read the -- constant being pushed from a subsequent 'ByteString'. readOp :: Word8 -> ByteString -> Maybe Opcode readOp b bs | isDUP b = pure . DUP . fromWord8 $ b - 0x80 | isSWAP b = pure . SWAP . fromWord8 $ b - 0x90 | isLOG b = pure . LOG . fromWord8 $ b - 0xa0 | isPUSH b = let n = fromIntegral (b - 0x60 + 1) in PUSH <$> word256 (BS.take n bs) | otherwise = readOp' b readOp' :: Word8 -> Maybe Opcode readOp' word = case word of 0x00 -> pure STOP ... _ -> Nothing
This fixes some things and makes other things worse. While
isDUP can be used elsewhere, removing many magical constants in many places, it preserves some magical constants in
readOp that are now no longer on the same line. This means they may drift apart.
In the case of
readOp, a better programming pattern is needed. It is time to parse, rather than validate. Omitting a few imports and helper functions,
readDUP, readSWAP, readLOG :: Word8 -> Maybe Opcode readDUP b = do guard (b >= 0x80 && b <= 0x8f) pure (DUP (fromWord8 (b - 0x80))) readSWAP b = do guard (b >= 0x90 && b <= 0x9f) pure (SWAP (fromWord8 (b - 0x90))) readLOG b = do guard (b >= 0xa0 && b <= 0xa4) pure (LOG (fromWord8 (b - 0xa0))) readPUSH :: Word8 -> ByteString -> Maybe Opcode readPUSH b bs = do guard (b >= 0x60 && b <= 0x7f) let n = fromIntegral (b - 0x60 + 1) PUSH <$> word256 (BS.take n bs)
is... functions may come handy since their definitions repeat throughout another codebase. They can be written without duplicating the hex ranges:
isDUP, isSWAP, isLOG :: Word8 -> Bool isDUP = isJust . readDUP isSWAP = isJust . readSWAP isLOG = isJust . readLOG isPUSH :: Word8 -> ByteString -> Bool isPUSH b bs = isJust (readPUSH b bs)
As for using these
read... parsers instead of their
is... predecessors, GHC's extension called ViewPatterns enables the following syntax:
readOp :: Word8 -> ByteString -> Maybe Opcode readOp (readDUP -> Just dup) _ = Just dup readOp (readSWAP -> Just swap) _ = Just swap readOp (readLOG -> Just log) _ = Just log readOp b (readPUSH b -> Just push) = Just push readOp b _bs = case b of 0x00 -> pure STOP ... _ -> Nothing
At this point, a hierarchy of
read... helper functions are composed to form
readOp in a way that does not leak hex codes and does not duplicate those hex codes. The
readOp (read...) and
readOp ... Just ... = Just ... repetitions make it seem like there could be a better way.
But I'm happy for now.