Melding Monads

2014 March 27

Announcing postgresql-simple-0.4.2

Filed under: Uncategorized — lpsmith @ 7:46 pm

With 19 new releases, postgresql-simple has seen substantial development since my announcement of version 0.3.1 nearly a year ago. Compared to my last update, the changes are almost exclusively bugfixes and new features. Little code should break as a result of these changes, and most if not all of the code that does break should fail at compile time.

As it stands today, postgresql-simple certainly has the best support for postgres-specific functionality of anything on hackage. So, for a few highlights:

Parametrized Identifiers

Thanks to contributions from Tobias Florek, perhaps the most exciting change for many is that postgresql-simple now properly supports parametrized identifiers, including column, table, and type names. These are properly escaped via libpq. So for example, you can now write:

              (Only ("schema.table" :: QualifiedIdentifier))

Of course, this particular example is still potentially very insecure, but it shouldn’t be possible to create a SQL injection vulnerability this way.

The downside of this change is that postgresql-simple now requires libpq version 9.0 or later. However, community support for 8.4 is coming to and end this July. Also, it is possible to use newer versions of libpq to connect to older versions of postgres, so you don’t have to upgrade your server. (In fact, in one particular situation I’m still using postgresql-simple to connect to postgresql 8.1, although some features such as non-builtin types don’t work.)

Copy In and Copy Out support

While the postgresql-libpq binding has supported COPY FROM STDIN and COPY TO STDOUT for some time, postgresql-simple now supports these directly without having to muck around with postgresql-libpq calls via the Internal module.

If you are interested in streaming data to and from postgres, you may also be interested in higher-level COPY bindings for pipes and io-streams. This is available in Oliver Charles’s pipes-postgresql-simple, which also supports cursors, and my own unreleased postgresql-simple-streams, which also supports cursors and large objects.

Out-of-box support for JSON, UUID, and Scientific

Thanks to contributions from Bas van Dijk, postgresql-simple now provides FromField and ToField instances for aeson‘s value type, Antoine Latter’s uuid, as well as scientific.

Savepoints and nestable folds

Thanks to contributions from Joey Adams, postgresql-simple now has higher-level support for savepoints in the Transaction module, as well as the ability to nest the fold operator. Previously, postgresql-simple had assigned a static name to the cursor that underlies fold, and now every connection has a counter used to generate temporary names.

Parametrized VALUES expressions

While executeMany and returning already support common use cases of this new feature, the new Values type allows you to parameterize more than just a single VALUES table literal.

For example, these situations commonly arise when dealing with writable common table expressions. Let’s say we have a table of things with an associated table of attributes:

   name TEXT NOT NULL,

CREATE TABLE attributes (
   id    INT NOT NULL REFERENCES things,
   key   TEXT   NOT NULL,

Then we can populate both a thing and its attributes with a single query, returning the id generated by the database:

query conn [sql|
    WITH thing AS (
        INSERT INTO things (name) VALUES (?) RETURNING id
    ), newattrs AS (
        INSERT INTO attributes
            SELECT, a.*
            FROM thing JOIN ? a
      SELECT id FROM thing;
  |]  ("bob", Values   [ "text" ,"int8" ]
                     [ ( "foo"  , 42    )
                     , ( "bar"  , 60    ) ])

The empty case is also dealt with correctly; see the documentation for details.

Improved support for outer joins

A long standing challenge with postgresql-simple is dealing with outer joins in a sane way. For example, with the schema above, let’s say we want to fetch all the things along with their attributes, whether or not any given thing has any attributes at all. Previously, we could write:

getAllTheThings :: Connection -> IO [(Text, Maybe Text, Maybe Int64)]
getAllTheThings conn = do
    query conn [sql|
        SELECT name, key, value
          FROM things LEFT OUTER JOIN attributes
            ON =

Now, the columns from the attributes table are not nullable, so normally we could avoid the Maybe constructors, however the outer join changes that. Since both of these columns are not nullable, they are always both null or both not null, which is an invariant not captured by the type. And a separate Maybe for each column gets awkward to deal with, especially when more columns are involved.

What we would really like to do is change the type signature to:

getAllTheThings :: Connection -> IO [Only Text :. Maybe (Text, Int64)]

And now we can, and it will just work! Well, almost. The caveat is that there is a separate instance FromRow (Maybe ...) for most of the provided FromRow instances. This won’t work with your own FromRow instances unless you also declare a second instance. What’s really desired is a generic instance:

instance FromRow a => FromRow (Maybe a)

This instance would return Nothing if all the columns that would normally be consumed are null, and attempt a full conversion otherwise. This would reduce code bloat and repetition, and improve polymorphism and compositionality.

But alas, it’s not possible to define such an instance without changing the FromRow interface, and quite probably breaking everybody’s FromRow instances. Which I’m totally willing to do, once somebody comes up with a way to do it.

2013 April 27

Announcing postgresql-simple 0.3.1

Filed under: Uncategorized — lpsmith @ 10:49 am

Array types

Postgresql-simple has been progressing since my last announcement of version 0.1 nearly a year ago. Since then there has been many changes by myself and contributors, some of which will break your code with or without compilation errors. So this post will attempt to highlight some of the bigger changes. Probably the most exciting recent change is the new support for PostgreSQL’s array types, largely due to work from Jason Dusek, Bas van Dijk, and myself. Here’s two examples of a table and query that makes use of this functionality:

   ( id   INT NOT NULL
   , link INT NOT NULL

   ( id     INT NOT NULL
   , matrix FLOAT8[][]
import qualified Data.Vector as V
import           Database.PostgreSQL.Simple
import           Data.Int(Int64)

getAssocs :: Connection -> IO [(Int,V.Vector Int)]
getAssocs conn = do
    query_ conn "SELECT id, ARRAY_AGG(link) FROM assocs GROUP BY id"

insertMatrix :: Connection -> Int -> V.Vector (V.Vector Double) -> IO Int64
insertMatrix conn id matrix = do
    execute conn "INSERT INTO matrices (id, matrix) VALUES (?,?)" (id, matrix)

TypeInfo Changes

In order to properly support the FromField a => FromField (Vector a) instance, the TypeInfo system was overhauled. Previously, the only information it tracked was a mapping of type OIDs to type names, first by consulting a static table and then using a per-connection cache, finally querying the pg_type metatable. This was useful for writing FromField instances for postgresql types that do not have a stable OID, such as those provided by an extension, like the period type from Temporal Postgres or the hstore type from the contributed modules bundled with PostgreSQL. However, proper array support required more information, especially the type of the array’s elements. This information is now available in an easily extended data structure, available in the new Database.PostgreSQL.Simple.TypeInfo module. This was introduced in 0.3, 0.3.1 added support for range and composite types; however there is not yet any FromField instance that makes use of this information or deals with these types.

IO at Conversion Time

Version 0.3 also stopped pre-computing the type name of every column and storing these in a vector before converting the results, by allowing a restricted set of IO actions at conversion time. This is a win, because the common case is that the typename is never consulted; for example almost all of the out-of-box FromField instances examine the type OID alone. Also, since IO information no longer has to be computed before conversion takes place, it makes it practical to consider using IO information that would rarely be used in normal circumstances, such as turning table OIDs into table names when errors are encountered. It’s possible to extend the IO actions available to FromField and FromRow instances by accessing the data constructor of the Conversion monad via the Database.PostgreSQL.Simple.Internal module.

This required changing the type of the FromField.typename operator, which will break your FromField instances that use it. It also required small changes to the FromField and FromRow interface, which has a chance of breaking some of your FromField and FromRow instances if they don’t strictly use the abstract Applicative and/or Monad interfaces. However, all of this breakage should be obvious; if your code compiles, it should work with the new interface.

HStore support

Version 0.3.1 also introduced out-of-box support for hstore. The hstore type provides key-value maps from textual strings to textual strings. Conversions to/from lists and Data.Map is provided, while conversions from other Haskell types can be easily implemented via the HStoreBuilder interface (similar to my own json-builder package), and conversions to other Haskell types can easily be implemented via the conversion to lists.

CREATE TABLE hstore_example
    ( id  INT NOT NULL
    , map hstore
insertHStore :: Connection -> Int -> [(Text,Text)] -> IO Int64
insertHStore conn id map = do
    execute conn "INSERT INTO hstore_example (id,map) VALUES (?,?)" (id, HStoreList map)

retrieveHStore :: Connection -> Int -> IO (Maybe [(Text,Text)])
retrieveHStore conn id
    xs <- query conn "SELECT map FROM hstore_example WHERE id = ?" (Only id)
    case xs of
      [] -> return Nothing
      (Only (HStoreList val):_) -> return (Just val)

Better Error Messages

Jeff Chu and Leonid Onokhov have improved both error messages and error handling options in the latest version. Thanks to Jeff, the ResultError exception now includes the column name and associated table OID (if any) from which the column was taken from. And Leonid has contributed a new Errors module that can be used to dissect SqlError values in greater detail.

Better Time Conversions

And in older news, version 0.1.4 debuted brand new time parsers and printers for the ISO-8601 syntax flavor that PostgreSQL emits, included FromField instances for LocalTime, and introduced new datatypes for dealing with PostgreSQL’s time infinities. Among other things, the new parsers correctly handle timestamps with UTC offsets of a whole number of minutes, which means (for example) that postgresql-simple now works in India. Version 0.2 removed the conversion from timestamp (without time zone) to the UTCTime and ZonedTime types, due to the inherent ambiguity that conversion represents; LocalTime is now the preferred way of handling timestamps (without time zones).

2012 July 10

Announcing split-channel

Filed under: Uncategorized — lpsmith @ 11:44 pm

The split-channel package is new library that is a small variation on Control.Concurrent.Chan. The most obvious change is that it splits the channel into sending and receiving ports. This has at least two advantages: first, that this enables the type system to more finely constrain program behavior, and second, a SendPort can have zero ReceivePorts associated with it, and messages written to such a channel can be garbage collected.

This library started life last fall as part of my experiments in adding support for PostgreSQL’s asynchronous notifications to Chris Done’s native pgsql-simple library. The initial motivation was that if a notification arrived and nobody was listening, I wanted to be able to garbage collect it. However, the type advantages are what keep me coming back.

Beyond the primary change, this library has a number of other small improvements over Control.Concurrent.Chan: the deprecated thread-unsafe functions aren’t there, and several operators have been added or improved, most notably listen, sendMany, fold, and split.

  1. listen attaches a new ReceivePort to an existing SendPort. By contrast, Chan only provides the ability to duplicate an existing ReceivePort.

    Edit: I was mistaken: listen is essentially equivalent to dupChan, whereas duplicate is new.

  2. sendMany sends a list of messages atomically. It’s a better name than writeList2Chan, which is not atomic and is only a convenience function written in terms of send. However, writeList2Chan does work on infinite streams, whereas sendMany does not.

  3. fold is a generalization of getChanContents, potentially avoiding some data structures.

  4. split cuts an existing channel into two channels. It gives you back a new ReceivePort associated with the existing SendPort, and a new SendPort associated with the existing ReceivePorts. This is a more general operator than one I’ve used in a few places to transparently swap out backend services.

    Chan does not provide the split operator, though one could be added. However I am skeptical that this is a good idea: it’s just a little too effect-ful for comfort. I think that putting a SendPort in an MVar tends to be a better idea than using split, even though it does introduce another layer of indirection.

Finally, a few acknowledgements are in order: primarily, Control.Concurrent.Chan and its authors and contributors, and secondarily, Joey Adams for GHC Bug #5870, the fix of which has been incorporated into split-channel.

2012 May 5

Announcing postgresql-simple 0.1

Filed under: Uncategorized — lpsmith @ 3:02 pm

A new release of postgresql-simple is now available. Version 0.1 introduces some breaking changes as well as some significant improvements to the interface. I’ll cover the biggest and best change in this post, for other changes you should read the changelog.

A few months ago, Ozgun Ataman suggested that I add some convenience code to simplify writing instances of QueryResults. I ended up liking his suggestion so much I decided to dig deeper into postgresql-simple and rebase the implementation around his interface. This allowed me to eliminate some intermediate data structures, and has some other advantages I’ll get to later. I’m rather pleased with the end result; here is a contrast of the old and the new:

class QueryResults a where
    convertResults :: [Field] -> [Maybe ByteString] -> Either SomeException a
instance (Result a, Result b) => QueryResults (a,b) where
    convertResults [fa,fb] [va,vb] = do
        !a <- convert fa va
        !b <- convert fb vb
        return (a,b)
    convertResults fs vs  = convertError fs vs 2

-- RowParser has instances for Applicative, Alternative, and Monad
class FromRow a where
    fromRow :: RowParser a

-- field is exported from Database.PostgreSQL.Simple.FromRow
field :: FromField a => RowParser a

instance (FromField a, FromField b) => FromRow (a,b) where
    fromRow = (,) <$> field <*> field

The new interface not only eliminates a significant amount of syntactic overhead, it also allowed me to automatically force the converted values to WHNF, which eliminates the old caveats about writing QueryResults instances. (Note that there are (still) caveats to writing FromField instances, even if they weren’t documented before.) And perhaps more interestingly, it enables new forms of composition. For example, the Types module defines a pair type for composing FromRow instances as follows:

data a :. b = a :. b 

infixr 3 :.

instance (FromRow a, FromRow b) => FromRow (a :. b) where
   fromRow = (:.) <$> fromRow <*> fromRow

Here’s an example of how you might use this. Let’s pretend we have a simple schema for that might be used for a (small) website where people can post stories and rate them from 0-9:

CREATE TABLE users                     (uid int not null, username text   not null);
CREATE TABLE posts   (pid int not null, uid int not null, content  text   not null);
CREATE TABLE ratings (pid int not null, uid int not null, score    float4 not null);

Then, here’s some example boilerplate that could be used to represent (part of) the schema in Haskell:

data User = User { u_uid :: Int, username :: Text }

instance FromRow User where
  fromRow = User <$> field <*> field

data Post = Post { p_pid :: Int, p_uid :: Int,  content :: Text } 

instance FromRow Post where
  fromRow = Post <$> field <*> field <*> field

type Score = Float

And finally, here is an example of how one could get the data out of the database in a form almost ready to generate a web page:

    rows :: [User :. Post :. Only Score] <- 
       query conn [sql|  SELECT u.*, p.*, avg(r.score) as avg_score
                           FROM users as u JOIN posts as p JOIN ratings as r  
                                ON u.uid = p.uid AND =
                          GROUP BY
                          ORDER BY avg_score DESC                             |] ()
    forM rows $ \(user :. post :. Only score) -> do
        -- generate the HTML for a single post here

Now of course, this isn’t meant to be a serious sketch of such a post/rating system; there are plenty of performance problems that are perfectly obvious to me, and maybe to you too. This is just to give you some ideas about how this functionality might be used: for example, to deal with (a limited class of) joins, or to deal with a computed column that’s been affixed to the table.

Some unresolved issues at this point would be dealing with natural joins and outer joins. I suspect that an adequate solution to these problems may require more breaking changes to this interface. In any case, multiple people have been asking for this kind of functionality, and I do think this interface is a step in the right direction.

For questions, comments, or suggestions relating to postgresql-simple, I suggest joining database-devel, a new mailing list pertaining not only to postgresql-simple, but also database development in Haskell in general. Also I’m often willing to answer questions if you happen to catch me on Freenode.

2012 February 24

Implementing json-builder (or, how the CommaBuilder got its spots)

Filed under: Uncategorized — lpsmith @ 6:42 am

Json-builder is capable of serializing aeson’s mandatory data structure at almost exactly the same speed as aeson itself. This is no accident, but rather it was a conscious implementation goal; in fact I managed to very substantially improve on aeson’s performance in a few minor corner cases (such as generating character escapes) and then shared what I discovered with Bryan O’Sullivan, who then improved aeson in turn.

One of the more interesting tricks in json-builder is the Monoid instances for Array and Object types. Those instances have been through 4 major revisions, with an interesting pattern of mistakes.

Now, a value of each type is conceptually represented by a Json string without the opening and closing delimiters. So for example, the row "x" 3 is represented by the string "x":3. The outer braces are added by ‘toJson’ function when it converts the Object to a value of the ‘Json’ type.

The delimiters are left off because a client can both append and prepend additional rows to the object, and we cannot efficiently remove the opening and closing brace from a Builder. It may help to pretend that an adversary is trying to break and abuse your interface. However, once the object is converted to a Json value, no such adversary can append and prepend rows; after all, a Json value isn’t necessarily even an Object.

However, when two objects are appended, we need to insert a comma between them. One might be tempted to write:

newtype Object = Object Builder

instance Monoid Object where
    mempty      = Object mempty
    mappend a b = Object (a ++ "," ++ b)

However, this is incorrect; in fact, it isn’t even a monoid. An adversary could produce syntactically incorrect Json simply by appending or prepending a mempty value. For example, here’s some sample expressions and the syntactically incorrect fragments they would generate:

   toJson  (row "x" 3 ++ mempty)               {"x":3,}
   mconcat [ row "x" 3, mempty, row "y" 4 ]     "x":3,,"y":4

Now, this adversary wouldn’t need to be very sophisticated, in fact these problems would almost certainly be discovered by a casual programmer. And a friendly programmer would find it excruciatingly painful to work around this broken implementation. But thinking like an adversary tends to lead to the best code; within reason, you want to minimize an adversary’s best move.

So, we need a way treat the empty Object specially. Also, we cannot simply append or prepend a comma to every singleton object, which would leave an superfluous comma either at the beginning or end of the object.

My first solution was to use the strict state monad to track whether or not we’ve emitted a singleton. Then row knows whether it needs to prepend a comma or not. In essence, my implementation was:

newtype Object = Object (State Bool Builder)

instance Monoid Object where
    mempty  = Object (return mempty)
    mappend = Object (liftM2 mappend)

row k v = Object $ do
   first <- get
   put False
   let str = toBuilder k ++ ":" ++ toBuilder v
   return $ if first then str else "," ++ str

instance Value Object where
   toJson (Object m) = Json ( "{" ++ evalState m True ++ "}" )

But some months later, as I was fleshing out other aspects of json-builder, it occurred to me that the first solution was overly strict; I couldn’t incrementally serialize something like toJson [1..10^9] for example. So, my second solution was conceptually the following code, albeit the actual code was special-cased and simplified slightly:

newtype Object = Object (ContT Builder (State Bool) ())

instance Monoid Object where
    mempty  = Object (return ())
    mappend = Object (>>)

row k v = Object $ do
    first <- get
    put False
    if first
      then return ()
      else write ","
    write (toBuilder k ++ ":" ++ toBuilder v)
    write b = mapContT (b ++) (return ())

instance Value Object where
    toJson (Object m) = Json ( "{" ++ run m ++ "}" )
      where run m = runState (runContT m (\_ -> return mempty)) True

But then I started benchmarking json-builder against aeson. Although I was in the right ballpark, I discovered a noticeable performance gap when serializing objects and vectors. So I started thinking about how to improve the Monoid implementation, and soon realized I really didn’t need a full blown monad. My third implementation was much prettier and closed the performance gap:

data Object = Empty | Comma Builder

instance Monoid Object where
    mempty = Empty
    mappend Empty y = y
    mappend x Empty = x
    mappend (Comma a) (Comma b) = Comma (a ++ "," ++ b)

row k v = Comma (toBuilder k ++ ":" ++ toBuilder v)

instance Value Object where
    toJson Empty     = Json "{}"
    toJson (Comma x) = Json ( "{" ++ x ++ "}" )

But, I soon realized that I had replicated the same mistake from the first solution, namely that this implementation was insufficiently lazy for incremental serialization. Fixing that was a simple matter of applying a standard and well-known code transformation to mappend:

instance Monoid Object where
    mempty = Empty
    mappend Empty     x = x
    mappend (Comma a) x = Comma ( a ++ case x of
                                         Empty   -> mempty
                                         Comma b -> "," ++ b )

However, this transformation reintroduced a performance gap, which was closed by adding a strictness annotation (<– worth reading) on the data constructor:

data Object = Empty | Comma !Builder

And that my friends, is the story of how the CommaBuilder got its spots.

Data Structure Agnostic JSON Serialization

Filed under: Uncategorized — lpsmith @ 2:19 am

Recently, Johan Tibell wrote a post on serialization APIs in Haskell, and thought it might be good to mention the approach used in my own json-builder, which I hadn’t previously promoted to very many others.

In the post, Johan highlighted the Value data structure mandated by the popular aeson package, and had a little aside:

Aside: even though we’ll serialize Haskell values to and from this type, it would have been reasonable, although perhaps more cumbersome for the users of our API, to skip the Value type entirely and convert our Haskell values directly to and from JSON-encoded ByteStrings.

type Object = HashMap Text Value

type Array = Vector Value

data Value = Object !Object
           | Array !Array
           | String !Text
           | Number !Number
           | Bool !Bool
           | Null

data Number = I !Integer
            | D !Double

Skipping this data structure is exactly what json-builder does. It takes arbitrary data directly to a json string. It’s also efficient, capable of serializing aeson’s data structure with identical performance as aeson itself. It’s also a robust abstraction, meaning that all uses of the basic interface will result in syntactically correct json strings. And, json-builder just as easy to use as aeson’s ToJSON typeclass.

Unfortunately my library does not solve the problem of parsing and processing Json values; there is no analog of the FromJSON typeclass, though I am interested in how one might implement similarly data structure agnostic json parsing.

The basic idea looks exactly the same as ToJSON class, though its currently named Value instead:

class Value a where
    toJson :: a -> Json

Now, Json is an opaque type, which is why this is a robust abstraction. All you can do with it is to turn it into a string, and use it to build bigger Json values.(In fact, if you look inside the Internal module, you’ll learn that Json is just a newtype wrapper around blaze-builder.)

Now, to take the example that Johan uses, let’s say you want to provide Value instance that serializes a Person record into a Json Object. Now, there are serialization instances for Data.Map and Data.HashMap, so you could take Aeson’s approach and build one of those first. Or you could circumvent the abstraction and produce Builders yourself. But what you really want to do is use the JsObject type class:

data Person = Person { name :: !Text, born :: !Int }

instance JsObject Person where
    toObject p = mconcat [
                   "name" `row` name p
                 , "born" `row` born p

This code is identical (modulo renaming) to the code that Johan gave to turn a Person into an Aeson structure. What it does is construct an Object value, which is an opaque type that builds Json Object. Object values have a very simple API. It only provides a singleton constructor and a Monoid instance. And you can turn an Object value into a Json value, of course:

instance Value Person where
    toJson p = toJson (toObject p)

Unlike aeson, you have full control the order in which the object’s fields appear. Unfortunately, json-builder will also happily produce JSON objects with duplicate field names, whereas aeson ensures that field names are unique. Neither issue is likely to be a very big deal in practice.

Now, json-builder has a couple of potentially interesting advantages over aeson. Let’s look at the serialization code for Haskell lists:

instance Value a => JsArray [a] where
    toArray = foldr (\x xs -> element x `mappend` xs) mempty

instance Value a => Value [a] where
    toJson = toJson . toArray

Unlike Aeson’s list serialization, this is a good consumer, and thus can fuse with good producers. So for example, when compiled with optimization, toJson [1..10^9] shouldn’t create a list at all, but rather directly produce a Json list of integers.

Also, this code is incremental even if it doesn’t fuse. It doesn’t need the entire list to start producing the Json string. Aeson, by contrast, marshals the entire list into a Vector before it produces anything.

Whether or not either of these advantages mean much to real world applications remain to be seen. I would guess that for most such applications, the structure-agnostic aspects are a bigger win.

This generality doesn’t cost anything over aeson in either serialization speed or ease of use; for example, here’s an instance for Aeson’s data structure:

instance Value Aeson.Value where
    toJson (Object v) = toJson v
    toJson (Array  v) = toJson v
    toJson (String v) = toJson v
    toJson (Number v) = toJson v
    toJson (Bool   v) = toJson v
    toJson  Null      = toJson ()

instance Value Number where
    toJson (I x) = toJson x
    toJson (D x) = toJson x

instance Value a => JsArray (Vector a) where
    toArray = Vector.foldr (\x xs -> element x `mappend` xs) mempty

instance Value a => Value (Vector a) where
    toJson = toJson . toArray

This turns out to be almost exactly equal in performance as the serialization code in aeson. (and perhaps I should add an instance for vector to json-builder) Take note that you don’t need to use such a simple recursion in either aeson or json-builder. You can easily tweak the serialization of any part of a data structure by calling something other than toJson. For example, say you have a map of Maybe values, and you don’t want to include keys associated with Nothing. (These would normally be rendered as null.) Then you can use this code:

noNothings :: (JsString k, Value v) => Map k (Maybe v) -> Object
noNothings = Map.foldrWithKey f mempty
   where f k mv xs = case mv of
                       Nothing -> xs
                       Just v  -> row k v `mappend` xs

Json-builder only solves half of the problem that aeson solves, but it solves that half in a more flexible and potentially more efficient way without sacrificing ease of use in common cases.

2011 December 30

Announcing postgresql-simple

Filed under: Uncategorized — lpsmith @ 5:04 am

I’ve done a bit of database programming off and on over the years, and when I started into a larger project and decided to use Haskell and PostgreSQL, I didn’t understand exactly how bad of a choice Haskell was for PostgreSQL development at the time. So postgresql-simple and postgresql-libpq is hopefully a small step towards remedying the situation. This work is a fork of Bryan O’Sullivan’s mysql-simple library and Grant Monroe’s libpq binding, with some code informed by Chris Done’s pgsql-simple library, and I’d to thank all for their terrific work.

So, there are five things that, taken together, distinguish postgresql-simple from the other PostgreSQL options on hackage:

1. postgresql-simple uses libpq, PostgreSQL’s official client library for C

Chris Done’s pgsql-simple is a pure Haskell implementation of a small subset of the PostgreSQL frontend/backend protocol. I do think there is technical merit to a native Haskell implementation, and in particular I’ve identified a potentially significant advantage that cannot be achieved via libpq. However writing a native implementation is a significant undertaking, and such efforts invariably lag behind in terms of features. And though pgsql-simple is experimental software that hasn’t been officially released and I don’t want to be unfair to it, pgsql-simple performs badly. I sped up a program I had been using to test pgsql-simple by over a factor of 20 simply by switching to postgresql-simple.

Among other things using libpq means that postgresql-simple gets the full range of connection and authentication options out of box, including support for Unix Domain Sockets, SSL, and GSSAPI.

2. postgresql-simple uses libpq-based string escaping.

HSQL doesn’t provide parameterized SQL statements at all, requiring programmers to dynamically generate sql and handle escaping issues themselves. And experience has shown punting on this issue is unacceptable, as dynamically generating SQL is risky and forcing programmers to handle escaping issues is an order of magnitude worse.

HDBC and pgsql-simple support parameterized SQL, but they attempt to do the escaping themselves. The problem is that this is usually a bit more subtle than it first appears, leading to bugs and potential vulnerabilities. For example, in PostgreSQL, escape syntax depends on the value of the standard_conforming_strings server setting, which libpq will detect and accommodate accordingly.

3. postgresql-simple makes no pretense of supporting prepared statements

In PostgreSQL, a prepared statement allows one to send off a parameterized sql statement to a backend, and then re-use that statement as many times as you like by just filling in the parameters. This means the text of the prepared statement does not repeatedly traverse the wire, the backend does not parse the sql query multiple times, the backend re-uses the query plan generated when the statement was prepared, and you get to use protocol-level parameters which can eliminate the need for escaping strings and converting numbers to base-10 and back.

Note that prepared statements are very often, but not universally, advantageous. Starting from nothing, they require two round trips to get any information out, so preparation can be disadvantageous for one-shot queries. Also, occasionally re-planning a query can be advantageous, so infrequently executed queries can be harmed by preparation as well.

HDBC nominally supports prepared statements, but in fact all HDBC-postgresql does is cache some of the preprocessing of the query. No preparation in the database sense ever occurs. So while the lack of support for prepared statements is a definite disadvantage, neither does postgresql-simple pretend to prepare statements when it doesn’t.

4. postgresql-simple provides a simple API that most programmers would find familiar

Takusen appears to be the one PostgreSQL database access library for Haskell that gets basic implementation details more-or-less correct. Unfortunately it exports an esoteric API that is not applicable in all situations. In particular, web applications often use various forms of connection caching or pooling, which is fundamentally incompatible with the deterministic connection resource guarantees provided by Takusen.

Also, the *-simple libraries have a relatively nice interface compared to HDBC. Ultimately that was the breaking point that caused me to spend the time to create postgresql-simple; no one database library did everything I needed, and while working on some new code where my mangled fork of HDBC made the most sense, I realized I really wished I was using pgsql-simple instead. Also, I paid attention to ensure that application code could add support for user-defined PostgreSQL types with a minimum amount of fuss and without modifying the library, something that neither HDBC nor pgsql-simple could really do.

5. Support for Listen/Notify

Listen/Notify is the perfect solution when you want to write a program that responds to changes made to a PostgreSQL database. Informing these programs when changes are available consumes less resources and provides lower latencies than periodic polling. And since you can use a rule or trigger to send the notification, these notifications can be robust; you don’t have to assume that the program making the changes even knows that anybody wants to be informed of the changes. Listen/notify is aware of and consistent with transactions; notifications don’t get sent until the transaction commits. Even if you are willing to put that kind of logic in the database clients, using listen/notify solves the otherwise sticky problem of finding and coordinating with the other clients.

Nothing else on hackage supports asynchronous notifications, unless you use a low-level libpq binding directly.

Looking Forward

Creating a mid-level database access library based on libpq is a significant undertaking, and there is an awful lot that isn’t supported, including prepared statements, binary data formats, copy in/copy out support, more and better support for PostgreSQL data types, and PostGIS support. Many other things could be improved, such as using libpq asynchronously, better end-to-end typechecking that discovers more errors at compile time, and other interface improvements. For example, SqlQQ is a simplistic quasiquoter to improve the literal syntax for multi-line SQL queries; one could certainly imagine extending the syntax to support including a $haskellvariable as a SQL parameter instead of going through the syntactic indirection of ? in the SQL string with haskellvariable in a separate Haskell parameter.

Funnily enough, I ran across this thread in which PostgreSQL luminaries were complaining about the quality of Python’s PostgreSQL libraries. And Python is considerably ahead of Haskell in this regard.

2010 August 25

Fun with the Lazy State Monad, Part 2

Filed under: Uncategorized — lpsmith @ 5:52 am

In a previous post, I demonstrated how to use the lazy state monad to turn Geraint Jones and Jeremy Gibbon’s breadth-first labeling algorithm into a reusable abstraction, and mentioned that I did not know how to combine the result with continuations. In this post, I will do exactly that.

A hat tip goes to Twan van Laarhoven for suggesting that I split my fresh operator into two, which I adopt in this post. In the comments, down is defined as:

type Fresh n a = State [n] a

down :: Fresh n a -> Fresh n a
down m = do
    (n:ns) <- get
    put ns
    a <- m
    ns' <- get
    put (n:ns')
    return a

Now, if we naïvely apply the continuation transformer in the Monad Transformer Library, the get and put operators are lifted as follows:

type FreshCC r n a = ContT r (State [n]) a

lift :: Monad m => m a -> ContT r m a
lift m = ContT (m >>=)

down :: FreshCC r n a -> FreshCC r n a
down m = do
    (n:ns) <- lift get
    lift (put ns)
    a <- m
    ns' <- lift get
    lift (put (n:ns'))
    return a

The problem is that this replaces the >>= of the lazy state monad with the >>= of the continuation transformer, and these two operators have different strictness properties. This, in turn, leads to non-termination. The trick is not to lift get and put, but rather conjugate down:

lower :: Monad m => ContT a m a -> m a
lower m = runContT m return

down :: FreshCC a n a -> FreshCC r n a
down m = lift $ do
    (n:ns) <- get
    put ns
    a <- lower m
    ns' <- get
    put (n:ns')
    return a

-- *or*

down = lift . original_down . lower

Now, the conjugation preserves the use of the lazy state monad’s >>= in the definition of down, however it changes the type of the argument from FreshCC r n a to FreshCC a n a! The other definitions contained in the previous post stay much the same.

Feel free to download freshcc.hs, a full working demonstration of this post. One can even use callCC, fresh, and down in the same computation and terminate! Sadly, I don’t have any demonstrations of this combination, nor do I have any applications in mind. My intuition about callCC is currently quite limited in this context.

I have implemented Dan Friedman’s angel-devil-milestone control operators in conjunction with fresh and down, and have used it to duplicate labels and change the shape of a spirited tree; but I’m definitely cheating at this point, as all I have done is observe that the definitions compute something. I have no control over what’s going on, nor do I know what the definitions are supposed to compute. (yet)

2010 May 2

BBC documentary on Science and Islam

Filed under: Uncategorized — lpsmith @ 3:11 pm

So I wanted to learn how to pronounce the name al-Khwārizmī. So I searched YouTube, and stumbled across Science and Islam hosted by Physicist Jim Al-Khalili, a BBC documentary series consisting of three parts: The Language of Science, The Empire of Reason, and The Power of Doubt.

I did not expect to enjoy the documentary as much as I did; I would like to buy the series on DVD, but I have not found a way to do this. It really opened my eyes: in spite of my western education, I knew something of the golden age of Islamic scholarship, but I failed properly appreciate its importance to the modern world.

I say “in spite of my western education” because many students would recognize the names of Aristotle, Socrates, and Plato, but not al-Khwārizmī, al-Haytham, or al-Biruni. And yet, the contributions of these Islamic scholars are in many ways more lasting and more profound than the Greeks. After all, al-Khwārizmī made large, foundational contributions to Algebra, which is more widely taught than Greek philosophy.

This is not to belittle the contributions of the Greeks; after all the Greeks had a profound influence on Islamic scholarship. The story starts with the creation of an Islamic empire, the largest empire the world had ever seen at the time, and the start of the Translation movement. The empire sought out written materials in any language from any culture, to be translated into Arabic and disseminated throughout the empire. Greek texts were among the first to be translated, and in many cases were given priority.

The widespread use of the Arabic language and lack of intellectual discrimination greatly accelerated the dissemination and creation of knowledge. al-Khwārizmī had the envious position of being one of the first humans in history to have access to both Indian and Greek mathematics, and he set about applying one body of work to the other, refining both and making new discoveries along the way.

al-Haytham wrote a seminal text on optics, and contributed significantly to the scientific method, and is considered one of the greatest scientists of the Middle Ages. al-Biruni measured the size of the Earth with unprecedented accuracy using a novel method based on trigonometry. al-Tusi headed up an program of astronomical observation that was of great importance to Copernicus, Newton, and other European scientists. One of the lasting contributions was questioning the Greek philosophical orthodoxy and an emphasis on repeatable experiments.

The medieval Islamic scholars gathered and disseminated knowledge widely, applied mathematics and practical science to measure the physical world to then-unprecedented accuracy, laid the groundwork that eventually grew into modern chemistry, revolutionized agriculture, industry, metallurgy, navigation, map making, and much more. It is difficult to overstate the importance of these efforts to later European science and scholarship or as a necessary step in the development of the modern world.

Eventually the golden age of Islamic scholarship came to an end; although Europeans tended to overstate the extent of the decline, it certainly did occur. The documentary attributes this to a large number of reasons, including the rejection of the printing press, the decline of the empire that funded these efforts, and entrenched political, economic, and religious interests. Perhaps there are some lessons here for today’s state of affairs.

The Shocking Blue Green Theme. Blog at


Get every new post delivered to your Inbox.