Melding Monads

2011 December 30

Announcing postgresql-simple

Filed under: Uncategorized — lpsmith @ 5:04 am

I’ve done a bit of database programming off and on over the years, and when I started into a larger project and decided to use Haskell and PostgreSQL, I didn’t understand exactly how bad of a choice Haskell was for PostgreSQL development at the time. So postgresql-simple and postgresql-libpq is hopefully a small step towards remedying the situation. This work is a fork of Bryan O’Sullivan’s mysql-simple library and Grant Monroe’s libpq binding, with some code informed by Chris Done’s pgsql-simple library, and I’d to thank all for their terrific work.

So, there are five things that, taken together, distinguish postgresql-simple from the other PostgreSQL options on hackage:

1. postgresql-simple uses libpq, PostgreSQL’s official client library for C

Chris Done’s pgsql-simple is a pure Haskell implementation of a small subset of the PostgreSQL frontend/backend protocol. I do think there is technical merit to a native Haskell implementation, and in particular I’ve identified a potentially significant advantage that cannot be achieved via libpq. However writing a native implementation is a significant undertaking, and such efforts invariably lag behind in terms of features. And though pgsql-simple is experimental software that hasn’t been officially released and I don’t want to be unfair to it, pgsql-simple performs badly. I sped up a program I had been using to test pgsql-simple by over a factor of 20 simply by switching to postgresql-simple.

Among other things using libpq means that postgresql-simple gets the full range of connection and authentication options out of box, including support for Unix Domain Sockets, SSL, and GSSAPI.

2. postgresql-simple uses libpq-based string escaping.

HSQL doesn’t provide parameterized SQL statements at all, requiring programmers to dynamically generate sql and handle escaping issues themselves. And experience has shown punting on this issue is unacceptable, as dynamically generating SQL is risky and forcing programmers to handle escaping issues is an order of magnitude worse.

HDBC and pgsql-simple support parameterized SQL, but they attempt to do the escaping themselves. The problem is that this is usually a bit more subtle than it first appears, leading to bugs and potential vulnerabilities. For example, in PostgreSQL, escape syntax depends on the value of the standard_conforming_strings server setting, which libpq will detect and accommodate accordingly.

3. postgresql-simple makes no pretense of supporting prepared statements

In PostgreSQL, a prepared statement allows one to send off a parameterized sql statement to a backend, and then re-use that statement as many times as you like by just filling in the parameters. This means the text of the prepared statement does not repeatedly traverse the wire, the backend does not parse the sql query multiple times, the backend re-uses the query plan generated when the statement was prepared, and you get to use protocol-level parameters which can eliminate the need for escaping strings and converting numbers to base-10 and back.

Note that prepared statements are very often, but not universally, advantageous. Starting from nothing, they require two round trips to get any information out, so preparation can be disadvantageous for one-shot queries. Also, occasionally re-planning a query can be advantageous, so infrequently executed queries can be harmed by preparation as well.

HDBC nominally supports prepared statements, but in fact all HDBC-postgresql does is cache some of the preprocessing of the query. No preparation in the database sense ever occurs. So while the lack of support for prepared statements is a definite disadvantage, neither does postgresql-simple pretend to prepare statements when it doesn’t.

4. postgresql-simple provides a simple API that most programmers would find familiar

Takusen appears to be the one PostgreSQL database access library for Haskell that gets basic implementation details more-or-less correct. Unfortunately it exports an esoteric API that is not applicable in all situations. In particular, web applications often use various forms of connection caching or pooling, which is fundamentally incompatible with the deterministic connection resource guarantees provided by Takusen.

Also, the *-simple libraries have a relatively nice interface compared to HDBC. Ultimately that was the breaking point that caused me to spend the time to create postgresql-simple; no one database library did everything I needed, and while working on some new code where my mangled fork of HDBC made the most sense, I realized I really wished I was using pgsql-simple instead. Also, I paid attention to ensure that application code could add support for user-defined PostgreSQL types with a minimum amount of fuss and without modifying the library, something that neither HDBC nor pgsql-simple could really do.

5. Support for Listen/Notify

Listen/Notify is the perfect solution when you want to write a program that responds to changes made to a PostgreSQL database. Informing these programs when changes are available consumes less resources and provides lower latencies than periodic polling. And since you can use a rule or trigger to send the notification, these notifications can be robust; you don’t have to assume that the program making the changes even knows that anybody wants to be informed of the changes. Listen/notify is aware of and consistent with transactions; notifications don’t get sent until the transaction commits. Even if you are willing to put that kind of logic in the database clients, using listen/notify solves the otherwise sticky problem of finding and coordinating with the other clients.

Nothing else on hackage supports asynchronous notifications, unless you use a low-level libpq binding directly.

Looking Forward

Creating a mid-level database access library based on libpq is a significant undertaking, and there is an awful lot that isn’t supported, including prepared statements, binary data formats, copy in/copy out support, more and better support for PostgreSQL data types, and PostGIS support. Many other things could be improved, such as using libpq asynchronously, better end-to-end typechecking that discovers more errors at compile time, and other interface improvements. For example, SqlQQ is a simplistic quasiquoter to improve the literal syntax for multi-line SQL queries; one could certainly imagine extending the syntax to support including a $haskellvariable as a SQL parameter instead of going through the syntactic indirection of ? in the SQL string with haskellvariable in a separate Haskell parameter.

Funnily enough, I ran across this thread in which PostgreSQL luminaries were complaining about the quality of Python’s PostgreSQL libraries. And Python is considerably ahead of Haskell in this regard.

The Shocking Blue Green Theme. Create a free website or blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.