Melding Monads

2012 February 24

Implementing json-builder (or, how the CommaBuilder got its spots)

Filed under: Uncategorized — lpsmith @ 6:42 am

Json-builder is capable of serializing aeson’s mandatory data structure at almost exactly the same speed as aeson itself. This is no accident, but rather it was a conscious implementation goal; in fact I managed to very substantially improve on aeson’s performance in a few minor corner cases (such as generating character escapes) and then shared what I discovered with Bryan O’Sullivan, who then improved aeson in turn.

One of the more interesting tricks in json-builder is the Monoid instances for Array and Object types. Those instances have been through 4 major revisions, with an interesting pattern of mistakes.

Now, a value of each type is conceptually represented by a Json string without the opening and closing delimiters. So for example, the row "x" 3 is represented by the string "x":3. The outer braces are added by ‘toJson’ function when it converts the Object to a value of the ‘Json’ type.

The delimiters are left off because a client can both append and prepend additional rows to the object, and we cannot efficiently remove the opening and closing brace from a Builder. It may help to pretend that an adversary is trying to break and abuse your interface. However, once the object is converted to a Json value, no such adversary can append and prepend rows; after all, a Json value isn’t necessarily even an Object.

However, when two objects are appended, we need to insert a comma between them. One might be tempted to write:

newtype Object = Object Builder

instance Monoid Object where
    mempty      = Object mempty
    mappend a b = Object (a ++ "," ++ b)

However, this is incorrect; in fact, it isn’t even a monoid. An adversary could produce syntactically incorrect Json simply by appending or prepending a mempty value. For example, here’s some sample expressions and the syntactically incorrect fragments they would generate:

   toJson  (row "x" 3 ++ mempty)               {"x":3,}
   mconcat [ row "x" 3, mempty, row "y" 4 ]     "x":3,,"y":4

Now, this adversary wouldn’t need to be very sophisticated, in fact these problems would almost certainly be discovered by a casual programmer. And a friendly programmer would find it excruciatingly painful to work around this broken implementation. But thinking like an adversary tends to lead to the best code; within reason, you want to minimize an adversary’s best move.

So, we need a way treat the empty Object specially. Also, we cannot simply append or prepend a comma to every singleton object, which would leave an superfluous comma either at the beginning or end of the object.

My first solution was to use the strict state monad to track whether or not we’ve emitted a singleton. Then row knows whether it needs to prepend a comma or not. In essence, my implementation was:

newtype Object = Object (State Bool Builder)

instance Monoid Object where
    mempty  = Object (return mempty)
    mappend = Object (liftM2 mappend)

row k v = Object $ do
   first <- get
   put False
   let str = toBuilder k ++ ":" ++ toBuilder v
   return $ if first then str else "," ++ str

instance Value Object where
   toJson (Object m) = Json ( "{" ++ evalState m True ++ "}" )

But some months later, as I was fleshing out other aspects of json-builder, it occurred to me that the first solution was overly strict; I couldn’t incrementally serialize something like toJson [1..10^9] for example. So, my second solution was conceptually the following code, albeit the actual code was special-cased and simplified slightly:

newtype Object = Object (ContT Builder (State Bool) ())

instance Monoid Object where
    mempty  = Object (return ())
    mappend = Object (>>)

row k v = Object $ do
    first <- get
    put False
    if first
      then return ()
      else write ","
    write (toBuilder k ++ ":" ++ toBuilder v)
  where
    write b = mapContT (b ++) (return ())

instance Value Object where
    toJson (Object m) = Json ( "{" ++ run m ++ "}" )
      where run m = runState (runContT m (\_ -> return mempty)) True

But then I started benchmarking json-builder against aeson. Although I was in the right ballpark, I discovered a noticeable performance gap when serializing objects and vectors. So I started thinking about how to improve the Monoid implementation, and soon realized I really didn’t need a full blown monad. My third implementation was much prettier and closed the performance gap:

data Object = Empty | Comma Builder

instance Monoid Object where
    mempty = Empty
    mappend Empty y = y
    mappend x Empty = x
    mappend (Comma a) (Comma b) = Comma (a ++ "," ++ b)

row k v = Comma (toBuilder k ++ ":" ++ toBuilder v)

instance Value Object where
    toJson Empty     = Json "{}"
    toJson (Comma x) = Json ( "{" ++ x ++ "}" )

But, I soon realized that I had replicated the same mistake from the first solution, namely that this implementation was insufficiently lazy for incremental serialization. Fixing that was a simple matter of applying a standard and well-known code transformation to mappend:

instance Monoid Object where
    mempty = Empty
    mappend Empty     x = x
    mappend (Comma a) x = Comma ( a ++ case x of
                                         Empty   -> mempty
                                         Comma b -> "," ++ b )

However, this transformation reintroduced a performance gap, which was closed by adding a strictness annotation (<– worth reading) on the data constructor:

data Object = Empty | Comma !Builder

And that my friends, is the story of how the CommaBuilder got its spots.

About these ads

1 Comment »

  1. I read a lot of interesting articles here. Probably you spend
    a lot of time writing, i know how to save you a lot of work,
    there is an online tool that creates unique, google friendly articles
    in seconds, just search in google – laranitas free content source

    Comment by Allan — 2014 August 27 @ 11:34 am


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Shocking Blue Green Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: