Json-builder is capable of serializing aeson’s mandatory data structure at almost exactly the same speed as aeson itself. This is no accident, but rather it was a conscious implementation goal; in fact I managed to very substantially improve on aeson’s performance in a few minor corner cases (such as generating character escapes) and then shared what I discovered with Bryan O’Sullivan, who then improved aeson in turn.
One of the more interesting tricks in json-builder is the Monoid instances for Array and Object types. Those instances have been through 4 major revisions, with an interesting pattern of mistakes.
Now, a value of each type is conceptually represented by a Json string without the opening and closing delimiters. So for example, the row "x" 3
is represented by the string "x":3
. The outer braces are added by ‘toJson’ function when it converts the Object to a value of the ‘Json’ type.
The delimiters are left off because a client can both append and prepend additional rows to the object, and we cannot efficiently remove the opening and closing brace from a Builder. It may help to pretend that an adversary is trying to break and abuse your interface. However, once the object is converted to a Json value, no such adversary can append and prepend rows; after all, a Json value isn’t necessarily even an Object.
However, when two objects are appended, we need to insert a comma between them. One might be tempted to write:
newtype Object = Object Builder instance Monoid Object where mempty = Object mempty mappend a b = Object (a ++ "," ++ b)
However, this is incorrect; in fact, it isn’t even a monoid. An adversary could produce syntactically incorrect Json simply by appending or prepending a mempty value. For example, here’s some sample expressions and the syntactically incorrect fragments they would generate:
toJson (row "x" 3 ++ mempty) {"x":3,} mconcat [ row "x" 3, mempty, row "y" 4 ] "x":3,,"y":4
Now, this adversary wouldn’t need to be very sophisticated, in fact these problems would almost certainly be discovered by a casual programmer. And a friendly programmer would find it excruciatingly painful to work around this broken implementation. But thinking like an adversary tends to lead to the best code; within reason, you want to minimize an adversary’s best move.
So, we need a way treat the empty Object specially. Also, we cannot simply append or prepend a comma to every singleton object, which would leave an superfluous comma either at the beginning or end of the object.
My first solution was to use the strict state monad to track whether or not we’ve emitted a singleton. Then row
knows whether it needs to prepend a comma or not. In essence, my implementation was:
newtype Object = Object (State Bool Builder) instance Monoid Object where mempty = Object (return mempty) mappend = Object (liftM2 mappend) row k v = Object $ do first <- get put False let str = toBuilder k ++ ":" ++ toBuilder v return $ if first then str else "," ++ str instance Value Object where toJson (Object m) = Json ( "{" ++ evalState m True ++ "}" )
But some months later, as I was fleshing out other aspects of json-builder, it occurred to me that the first solution was overly strict; I couldn’t incrementally serialize something like toJson [1..10^9]
for example. So, my second solution was conceptually the following code, albeit the actual code was special-cased and simplified slightly:
newtype Object = Object (ContT Builder (State Bool) ()) instance Monoid Object where mempty = Object (return ()) mappend = Object (>>) row k v = Object $ do first <- get put False if first then return () else write "," write (toBuilder k ++ ":" ++ toBuilder v) where write b = mapContT (b ++) (return ()) instance Value Object where toJson (Object m) = Json ( "{" ++ run m ++ "}" ) where run m = runState (runContT m (\_ -> return mempty)) True
But then I started benchmarking json-builder against aeson. Although I was in the right ballpark, I discovered a noticeable performance gap when serializing objects and vectors. So I started thinking about how to improve the Monoid implementation, and soon realized I really didn’t need a full blown monad. My third implementation was much prettier and closed the performance gap:
data Object = Empty | Comma Builder instance Monoid Object where mempty = Empty mappend Empty y = y mappend x Empty = x mappend (Comma a) (Comma b) = Comma (a ++ "," ++ b) row k v = Comma (toBuilder k ++ ":" ++ toBuilder v) instance Value Object where toJson Empty = Json "{}" toJson (Comma x) = Json ( "{" ++ x ++ "}" )
But, I soon realized that I had replicated the same mistake from the first solution, namely that this implementation was insufficiently lazy for incremental serialization. Fixing that was a simple matter of applying a standard and well-known code transformation to mappend:
instance Monoid Object where mempty = Empty mappend Empty x = x mappend (Comma a) x = Comma ( a ++ case x of Empty -> mempty Comma b -> "," ++ b )
However, this transformation reintroduced a performance gap, which was closed by adding a strictness annotation (<– worth reading) on the data constructor:
data Object = Empty | Comma !Builder
And that my friends, is the story of how the CommaBuilder got its spots.