Melding Monads

2011 October 24

Concurrency and Foreign Functions in the Glasgow Haskell Compiler

Filed under: haskell — lpsmith @ 12:59 pm

I had an informative chat with Simon Marlow and Jules Bean a little while ago about concurrency in the Glasgow Haskell Compiler and its interaction with foreign code, specifically the difference between safe and unsafe calls. It turns out that the implementation is both simpler and more powerful than my vague misconceptions about how it worked. And since there seems to be quite a few myths out there, I thought I should write up an overview of GHC’s implementation.

Haskell’s concurrency model is relatively simple. Conceptually, Haskell threads are OS threads, and in most cases the programmer can treat them as such. This is known as a 1-1 threading model. In actuality, GHC’s implementation uses an M-N model, but provides an illusion of a 1-1 model. As one would expect, there are a few caveats to this illusion, but they tend to be minor. First, we’ll cover three fundamental components of GHC’s concurrency:

Haskell Threads

Haskell threads are implemented in GHC via cooperative multitasking. They are scheduled by GHC’s run-time system, making them the M of the M-N model. Yield points are generated automatically by the compiler, which provides an illusion of preemptive multitasking to the programmer. Also, the stack for each thread starts small but grows as needed. Cooperative multitasking and growable stacks make Haskell threads cheap and efficient, typically scaling to millions of threads on contemporary computer systems.

The downside is that GHC’s threads cannot by themselves take advantage of multiple CPUs. Also, foreign functions don’t cooperatively multitask. Notably, since GHC implements its arbitrary-precision Integer datatype via the GNU Multiple Precision Library, the illusion of preemptive multitasking can be less than perfect when executing Integer-heavy code.

To make use of Haskell threads, all you have to do is create them by calling Control.Concurrent.forkIO. There are no special compile-time, link-time, or run-time options needed to enable them.

Operating System Threads

OS threads offer true preemptive multitasking and are scheduled by the kernel. They are also more expensive, typically scaling to thousands of threads on contemporary systems. However, they can execute on multiple CPUs, and they provide a way to deal with foreign calls that either block or take a long time to execute.

To use OS threads, all you do need to link your executable with GHC’s threaded runtime system using the -threaded option. There is nothing particularly special you need to do inside Haskell code to utilize operating system threads; GHC’s threaded runtime will endeavor to maintain the illusion that Haskell threads are OS threads as best it can.


In the context of GHC, a capability is an operating system thread that is able to execute Haskell code. GHC schedules Haskell threads among the capabilities, making them the N of the M-N model. Previous versions of GHC only supported one capability, but GHC now supports as many capabilities as you want. While the total number of capabilities remains fixed, the collection of OS threads that are capabilities changes over time.

You can set the number of capabilities “x” by using the -N[x] run-time option. This can be done either by passing the executable +RTS -N[x] on the command line, or by setting the default RTS options by linking the executable with -with-rtsopts="-N[x]". Note that you will need the -threaded and may need the -rtsopts link-time options.

The Foreign Function Interface

The FFI supports two kinds of calls: safe and unsafe. A “safe” call blocks the calling OS thread and Haskell thread until the call finishes. Blocking these is unavoidable. However, a safe call does not block the capability. When GHC performs a safe call, it performs the call inside the current OS thread, which is a capability. The capability then moves to another OS thread. If no other threads are available, GHC will transparently create a new OS thread. OS threads are pooled to try to avoid creating or destroying them too often.

An unsafe call blocks the capability in addition to blocking the OS and Haskell threads. This was a pretty big deal when GHC only supported a single capability. In effect, an unsafe call would block not only the Haskell thread that made the call, but every Haskell thread. This gives rise to the myth that unsafe calls block the entire Haskell runtime, which is no longer true. An unsafe foreign call that blocks is still undesirable, but depending on configuration, may not be as bad as it used to be.

The advantage of unsafe calls is that they entail significantly less overhead. Semantically, a foreign function that might call back into Haskell must be declared as a “safe” call, whereas foreign function that will never call back into Haskell may be declared as “unsafe”. As a result, safe calls must perform extra bookkeeping.

(As an aside, this vocabulary is not particularly consistent with Haskell’s other use of “safe” and “unsafe”. In this other sense, foreign calls are unsafe because they can be used to break the abstractions that GHC provides.)

In practice, most foreign functions will never call back into Haskell. Thus choosing whether to mark a foreign function as safe or unsafe usually revolves around the trade-off between concurrency and overhead. If needed, you can declare both a safe and an unsafe binding to the same foreign function.

Bound threads

Finally, the most significant caveat to GHC’s 1-1 threading illusion is that some foreign code (notably OpenGL) expects to be run in a single OS thread, due to the use of thread-local storage or the like.

GHC’s answer is “bound threads”. GHC provides forkOS, which creates a Haskell thread with a special property: every foreign call that the Haskell thread makes is guaranteed to happen inside the same OS thread. It is perhaps a bad choice of name, as there is no need to call forkOS to utilize operating system threads. The only time you need to call forkOS is when you need an honest-to-goodness 1-1 correspondence between a Haskell thread and a kernel thread.

Finishing up

These kinds of illusions are a common theme in GHC: what appears to be synchronous blocking IO is in fact asynchronous non-blocking IO, what appears to be preemptive multitasking is in fact cooperative multitasking, what appears to be 1-1 threading is actually M-N threading. GHC endeavors to present the user with a nice interface, while using more efficient techniques behind the scenes. And, for the most part, these illusions hold up pretty well.

I hope the community finds this overview helpful. Most of what I’ve talked about is covered in “Extending the Haskell Foreign Function Interface with Concurrency” by Simon Marlow, Simon Peyton Jones, and Wolfgang Thaller. I hope this post is a useful introduction to the paper.

2010 July 21

Iteratees and a (very) Minimal Model of Shared-State Concurrency

Filed under: continuations, haskell, monads — lpsmith @ 6:26 pm

In May, I gave a brief presentation to the Chicago Haskell User’s Group on modelling shared-state concurrency in a purely functional way without using the IO or STM monads. You can download a copy of the slides and Haskell source files. The slides may be of particular interest to those who are baffled by the continuation monad but are comfortable with the state monad.

The problem motivating the presentation is the Thread Game, attributed to J Strother Moore: given two threads each running the doubler pseudo-code below, with local variables x and y and a single shared global variable a initialized to 1, what values can appear in the global variable? Can you prove your answer?

doubler = loop
            x := a
            y := a
            a := x + y

It should be noted that no synchronization is used, that each line represents an atomic instruction, and that arbitrary schedules are allowed. (e.g. they need not be fair schedules)

The talk covered two implementations of a function from finite schedules to the final value of a, providing a model of the Thread Game. The first implementation used a relatively straightforward technique reminiscent of automata and Turing Machines from theoretical computer science. The local configuration for a thread consists of the local variables plus an instruction pointer. The delta function step_doubler takes a local configuration and the current value of the global variable, and produces a new local configuration and value for the global variable.

type IP     = Int       -- Instruction Pointer
type Local  = Integer   -- Local Variable
type Global = Integer   -- Global Variable

type LocalConf  = (IP, Local, Local)

step_doubler :: (LocalConf,Global) → (LocalConf,Global)
step_doubler ( (ip,x,y) , a )
   = case ip of
      0 → ( (1,a,y) , a   )
      1 → ( (2,x,a) , a   )
      2 → ( (0,x,y) , x+y )

The global configuration consists of the value of the global variable, and one local configuration for each doubler process. The global step function takes the global configuration and a process identifier, and produces another global configuration.

type PID        = Int    -- Process Identifier
type GlobalConf = (LocalConf, LocalConf, Global)

step_global :: GlobalConf → PID → GlobalConf
step_global (d0, d1, a) pid
  = case pid of
     0 → let (d', a') = step_doubler (d0, a) in (d', d1, a')
     1 → let (d', a') = step_doubler (d1, a) in (d0, d', a')

The function from schedules to the final value in a is just a left fold, resulting in the first model of the thread game:

threadGame :: [PID] → Integer
threadGame pids = a 
    ( _, _, a ) = foldl step_global ( (0,0,0), (0,0,0), 1 ) pids

While this technique is generic and relatively straightforward, the code itself is not generic, and is fairly far removed from the original specification. It would seem natural to model doubler using the state monad as shown below. The state monad can handls the local variables and the global variable, but cannot represent the instruction pointer or offer the co-routine facility needed to model the given problem.

doubler = do
    x ← get
    y ← get
    put (x+y)

The continuation monad can represent the instruction pointer, but because Haskell is a pure language, further tricks are needed to mediate between the thread scheduler and each doubler process.

Iteratee-based Shared-State Concurrency

The second implementation used an idea derived from Iterators as shown to me by Roshan James. These are a more generic variant of to Oleg’s Iteratees. Iteratees are something of a hot topic these days, being a core feature of the Snap Framework and the subject of assorted blog posts.

data Iterator i o r
   = Done r
   | Result o (i → Iterator i o r)

The idea here is that an iterator is either done, and returns a final result (of type r), or it might want to produce more, so it returns an intermediate result (of type o) along with a continuation to be called if and when further computation is required. Interestingly, Stephen Blackheath also used an equivalent structure in today’s post, Co-routines in Haskell.

Basically, each thread is an Iteratee that accepts the value in the global variable as an input, and produces a new value for the global value and it’s own continuation as output. This continuation represents the local configuration, which consists of the instruction pointer and the two local variables. The thread scheduler manages these continuations and feeds the global variable into the corresponding continuation.

I had (almost) invented iteratees myself a number of years ago, although I didn’t realize the connection until after I had begun to write this blog post. Back in the year 2000, I helped write a paper about implementing finite automata and Turing machines in functional languages by directly using the standard textbook definition. In the process I stumbled on an alternate representation of a DFA that I did not publish, mentioned briefly here.

data DFA st ab = DFA (st → ab → st) st (st → Bool)

data ContDFA ab = ContDFA Bool (ab → ContDFA ab)

The first representation is the standard textbook definition, a automaton consists of a set of states st, an alphabet ab, and a transition function, a start state, and (the characteristic function of) a set of final states. Interestingly, this representation was revisited in the most recent issue of the Monad Reader.

The second representation is a specialized case of Iterator above, without the Done alternative. Basically, the ContDFA represents a single state in the machine, which has a boolean field saying whether or not the state is final, and a method that accepts a letter and returns another state.

When I was first exposed to Iterators, I was confused by them and the way my friend was using them. Funnily enough, I completely missed the connection to my ContDFA invention, even though I was immediately reminded of Mealy Machines and thought of that project! In a way, it was a fortuitous slip of the mind. While thinking about implementing Mealy Machines with continuations in an effort to understand Roshan’s work, I came up with “functional iterators”. (If anybody can suggest a better name, please speak up!)

newtype FunIter i o r
      = FunIter (FunIter o i r → i → r)

Compared to Iterator, FunIter assumes that the consumer of the intermediate values is itself written in the Continuation Passing Style. I don’t understand the exact relationship between Iter and Iterator; they are remarkably similar, but there are trade-offs between what these two structures can represent, trade-offs I don’t understand very well.

Both Iterator and FunIter are capable of expressing the coroutine facility needed to model the thread game, however, my Iterator solution has a few vestigial features, and I like my FunIter solution a bit better.

newtype FunIter i o r
      = FunIter {runIter :: FunIter o i r → i → r}

type Thread r st a = Cont (FunIter st st r) a

runThread :: Thread r st r → FunIter st st r
runThread m = runCont m (\r → FunIter (\_ _ → r))

I used a continuation monad to capture the local continuations, while FunIter mediates the transition between the local continuations and the scheduler continuation. We can then define get and put, which yield to the scheduler continuation:

get    = Cont (\k → FunIter (\(FunIter sk) st →  sk (k st) st))
put st = Cont (\k → FunIter (\(FunIter sk) _  →  sk (k ()) st))

Here is a more generic thread scheduler:

update :: Int → a → [a] → [a]
update n x [] = []
update n x (y:ys) 
  | n <= 0    =  x : ys 
  | otherwise =  y : update (n-1) x ys

runSchedule :: [FunIter st st st] → [PID] → st → st
runSchedule threads []          st = st
runSchedule threads (pid:pids)  st = runIter (threads !! pid) sched st
     sched = FunIter $ \iter st' → 
               runSchedule (update pid iter threads) pids st'

And finally, we can bring everything together into an alternate model of the thread game:

threadGame' :: [PID] → Integer
threadGame' sched = runSchedule [runThread doubler, runThread doubler] sched 1

I don’t know who first invented Iteratees, but the idea has been lurking in a variety of different contexts for some time. The functional iterators presented here might be original. The second solution is a bit longer and more esoteric, but it’s more generic and should scale better in terms of source code and maintenance.

2010 April 4

On powersets and folds

Filed under: algorithms, corecursion, haskell — lpsmith @ 10:03 pm

A while ago, somebody on reddit brought up a particularly cute and fairly well known definition of powerset:

powerset :: [a] -> [[a]]
powerset = filterM (const [False,True])

Unfortunately, it is also fairly well known that this is a very space-inefficient way of producing the powerset, for example evaluating length (powerset [1..32]) will allocate gigabytes of memory within seconds on modern computers, and is an effective denial-of-service attack against most default configurations.

Fortunately, there are solutions that avoid this space leak while maintaining the same style of definition. One solution is to use a revised definition of Control.Monad.filterM. Here is the standard definition:

filterM          :: (Monad m) => (a -> m Bool) -> [a] -> m [a]
filterM _ []     =  return []
filterM p (x:xs) =  do
   flg <- p x
   ys  <- filterM p xs
   return (if flg then x:ys else ys)

Here is the revised definition:

filterM'          :: (Monad m) => (a -> m Bool) -> [a] -> m [a]
filterM' _ []     =  return []
filterM' p (x:xs) =  do
   ys  <- filterM' p xs
   flg <- p x
   return (if flg then x:ys else ys)

The only difference is that we changed the order of two lines. This in turn changes the order in which the subsets are produced:

powerset  "abc" == ["","c","b","bc","a","ac","ab","abc"]
powerset' "abc" == ["","a","b","ab","c","ac","bc","abc"]

These orderings are analogous to counting in binary, with the original definition having the least significant bit at the end of the list, whereas the revised versions have the least signficant bit at the beginning:

pow      pow'

000 *    000 * 
001      100 * 
010      010   
011      110   
100 *    001
101      101
110      011
111      111

Both solutions have maximal sharing among the tails of the sublists, but the second solution arranges the sharing so that shared sublists appear adjacent to each other in the overall order, as illustrated by the asterisks. This allows length (powerset' [1..32]) to be evaluated in a very modest amount of space.

Note that this space leak is inherent to any solution that produces the first ordering, and has maximal sharing! If the first ordering is needed, it is better to recompute the sublists than to share them. As Cale Gibbard pointed out, one way to eliminate the sharing is to use another monad, such as the Logic monad.

import Control.Applicative ((<|>))
import Control.Monad.Logic (observeAll)
import Control.Monad (filterM)

powerset2 :: [a] -> [[a]]
powerset2 = observeAll . filterM (const (return False <|> return True))

Making powerset lazier

The definition of powerset can be generalized to produce all the finite sublists of a potentially infinite list. However, it is not obvious how to express this via tweaking filterM without breaking the abstraction, so let’s start with another efficient formulation based on concatMap.

powerset' [] = [[]]
powerset' (x:xs) = concatMap (\ys -> [ys, x:ys]) (powerset' xs)

On an infinite list, this definition gets stuck in an non-productive loop because powerset' must traverse the entire list before it returns anything. Note that on finite cases, the first sublist returned is always []. Formally, we can state this as powerset' xs == [] : tail (powerset' xs). It is sensible to extend this equality to the infinite case, due to the analogy to counting in binary. By making this substitution, we produce a lazier definition that can handle infinite lists, from which we can calculate a definition that’s more aesthetically pleasing and slightly more efficient:

powerset' []     = [[]]
powerset' (x:xs) = concatMap (\ys -> [ys, x:ys]) ([] : tail (powerset' xs))
                   -- apply definition of concatMap
                 = [] : [x] : concatMap (\ys -> [ys, x:ys]) (tail (powerset' xs))

We can clean this definition up by calculating definitions for tail . powerset'. We start by applying tail to both sides of the two cases:

tail (powerset' [])     = tail [[]]
                        = []
tail (powerset' (x:xs)) = tail ([] : [x] : concatMap (\ys -> [ys, x:ys]) (tail (powerset' xs)))
                        = [x] : concatMap (\ys -> [ys, x:ys]) (tail (powerset' xs))

Next, we can choose a new name for tail . powerset', such as nonempties, and substitute it through the definition. Finally, we can recover something equivalent to the original definition by appending the empty list to nonempties

lazyPowerset xs = [] : nonempties xs
  where nonempties [] = []
        nonempties (x:xs) = [x] : concatMap (\ys -> [ys, x:ys]) (nonempties xs)

This is equivalent to Data.List.subsequences as added in GHC 6.10.1. I thank Bertram Felgenhauer for (unknowingly) providing some comments that were rather helpful in creating this presentation.

Generalizing powerset to an enriched fold

By parameterizing the instances of (:) and [] that create the inner sublists, we can turn lazyPowerset into a kind of fold:

powerfoldr :: (a -> b -> b) -> b -> [a] -> [b]
powerfoldr f b as = b : nonempties as
  where nonempties [] = []
        nonempties (a:as) = f a b : concatMap (\b' -> [b', f a b']) (nonempties as)

For example, we can calculate the sum of every finite combination of the powers of 2:

powerfoldr (+) 0 (iterate (2*) 1)  == [0,1,2,3,4..]

An equivalent “higher-level” definition of powerfoldr in terms of map, foldr, and lazyPowerset is as follows:

powerfoldr' f b = map (foldr f b) . lazyPowerset

This definition, however, recomputes the right fold over common sublists; the sharing provided by lazyPowerset is not preserved by this latter definition. The first definition has the same level of sharing as the original, reducing the number of applications of f by half. I found the first definition of powerfoldr to be quite helpful in solving Project Euler Problem 268; it certainly was not optimal for the purpose, but it was more than adequate.

2010 March 14

Writer Monads via Continuations

Filed under: continuations, haskell, monads — lpsmith @ 9:00 pm

I have four draft posts started that I don’t have finished. But I won’t be finishing any of them today, instead I’ll issue an erratum for my paper in Issue 14 of the Monad Reader. On page 53 I wrote:

Writer monads partially enforce the efficient use of list concatenation. In the MTL, Writer [e] a is just a newtype isomorphism for ([e], a). It provides a function tell that takes a list and concatenates the remainder of the computation onto the end of the list. This naturally associates to the right, and thus avoids quadratic behavior. Of course, tell accepts arbitrary length lists, and one could inefficiently produce a long argument to tell.

The writer monad does no such thing. In fact, it’s quite easy to write a function using the writer monad that exhibits quadratic behavior. For example here is a function that uses the writer monad to produce the list [1..n] in quadratic time:

enumTo n
  | n <= 0    = return ()
  | otherwise = enumTo (n-1) >> tell [n]

By unfolding the definitions of return and (>>=) for Control.Monad.Writer, and the Monoid instance for List, we can find this is equivalent to:

enumToWriter n
  | n <= 0    = ((), [])
  | otherwise = let (a,w ) = enumToWriter (n-1) 
                    (b,w') = (\_ -> ((),[n])) ()
                 in (b,w ++ w')

By beta-reducing the let bindings and making use of snd, we get

enumToWriter'2 n
  | n <= 0    = ((), [])
  | otherwise = ((), snd (enumToWriter'2 (n-1)) ++ [n])

And finally, we can hoist the pairs out of the loop, if we want:

enumToWriter'3 n = ((), loop n)
    loop n
      | n <= 0    = []
      | otherwise = loop (n-1) ++ [n]

It should now be clear that this is quadratic, as the first element of the result gets copied (n-1) times, the second element gets copied (n-2) times, and so on. However, there is a solution. By adjusting the monad, the original definition can be executed in linear time. My erroneous quote is actually an accurate description the continuation-based writer monad.

instance (Monoid w) => MonadWriter w (Cont w) where
   tell xs = Cont (\k -> xs `mappend` k ())
   -- or,  in a direct style:
   -- tell xs = mapCont (xs `mappend`) (return ())

The expression runCont (enumTo n) (const []) produces the list [1..n] in linear time. The continuation-based writer monad provides the linear-time guarantee that I described but erroneously attributed to the regular writer monad. By unfolding the definition of return and (>>=) for Control.Monad.Cont and our definition for tell above, we can find out what this algorithm actually does:

-- unfold return and (>>=)
enumToCont n 
  | n <= 0    = k ()
  | otherwise = \k -> enumToCont (n-1) (\a  -> (\_ -> tell [n]) a k) 
-- beta-reduction
--            = \k -> enumToCont (n-1) (\_a -> tell [n] k)
-- unfold tell
--            = \k -> enumToCont (n-1) (\_  -> (\k -> [n] ++ k ()) k)
-- beta-reduction
--            = \k -> enumToCont (n-1) (\_  -> [n] ++ k ())
-- unfold (++)
--            = \k -> enumToCont (n-1) (\_  -> n : k ())

And, if we like, we can avoid passing () to the continuation argument, resulting in a normal accumulator argument:

enumToCont' n k
  | n <= 0    = k
  | otherwise = enumToCont' (n-1) (n:k)

By writing it with (:), we know it has to associate to the right and thus operates in linear time. In the quadratic-time solution, the (++) cannot be unfolded. However, this solution isn’t incremental: trying to evaluate take 5 (enumToCont' (10^12) []) will more than likely run your computer completely out of virtual memory long before it finishes. It’s not to difficult to see why: enumFromCont' produces a list element equal to it’s input, and it has to count down from a trillion before it produces [1,2,3,4,5, ...]. So it really is better to write it as:

enumTo' lim = loop 1
     loop n 
       | n <= lim  = tell [n] >> loop (n+1)
       | otherwise = return ()

This definition is both linear and incremental under both the continuation-based writer and the normal writer monad, but it isn’t a generic transformation that applies to any algorithm using the writer monad.

These two writer monads have a number tradeoffs; for example, the continuation-based writer monad cannot be used with the corecursive queue transformer in the paper. An efficient alternative is the regular writer monad with a list representation that offers efficient concatenation, such as difference lists.

For further reading, you might enjoy Asymptotic Improvement of Computations over Free Monads by Janis Voigtländer, which explores this phenomenon further.

2009 December 30

Fun with the Lazy State Monad

Filed under: continuations, corecursion, haskell, monads — lpsmith @ 9:20 pm

The lazy state monad doesn’t work the way you think it works. Thinking of the lazy state monad in terms of imperative programming is a very useful first approximation, at least in terms of what is computed, but in terms how things are computed this intuition is beyond useless. I hope to dissuade you of the latter part of the intuition in this post, by demonstrating two of the more interesting things that can be done with the lazy state monad.

Albert Y.C. Lai recently shared a program that demonstrates the lazy state monad in a rather interesting way:

pro :: State [Bool] ()
pro = do
  s <- get
  put (True : s)

Instead of tail recursion, this employs head recursion. The question is, what does pro return when you run it, and why? I found it easy to guess the correct answer, but my reasoning was completely wrong. Of course, if viewed through a wholly imperative mindset, this leads to non-termination, but the lazy state monad extracts usable information from this definition.

In my recent Monad Reader article, I disentangled the bits of circular programming from Lloyd Allison’s queue, and for the last year I have been obsessed with pulling the same trick with Jones and Gibbons’ breadth-first labeling algorithm. At times, this obsession has been a bit of a disease. It’s also worth pointing out Chris Okasaki’s discussion of a particular instance of this algorithm, and the mention of this algorithm on the FP Lunch Weblog. Here is code for the special case of breadth-first numbering:

data Tree a b = Leaf a | Branch b (Tree a b) (Tree a b)

lazyRenum :: Num n => Tree a b -> Tree n n
lazyRenum t = t'
     (ns, t') = renumber (0:ns, t)

     renumber (n:ns,  Leaf    _    ) = (n+1:ns  , Leaf    n      )
     renumber (n:ns,  Branch  _ l r) = (n+1:ns'', Branch  n l' r')
         (ns' , l')  = renumber (ns , l)
         (ns'', r')  = renumber (ns', r)

I finally disentangled the corecursive bits from this example today. The circular programming occurs in the list argument, not the tree. Note that the flow of the list ns from one call of renumber to the next is like the state monad. From this observation, I wrote the following whimsical code:

lazyRenum :: Num n => Tree a b -> Tree n n
lazyRenum = runFresh . renumber 
     renumber (Leaf    _    ) 
        = fresh (\n -> do 
                          return (Leaf n))
     renumber (Branch  _ l r) 
        = fresh (\n -> do
                          l' <- renumber l
                          r' <- renumber r
                          return (Branch n l' r'))

Once I had this right, it was pretty easy to fill in the definitions for fresh and runFresh, by cribbing off Chris Okasaki’s simplification of Jones and Gibbons’ algorithm:

type Fresh n a = State [n] a 

runFresh :: Num n => Fresh n a -> a
runFresh m = a
          (a, ns) = runState m (0:ns)

fresh :: Num n => (n -> Fresh n a) -> Fresh n a
fresh f = do
   (n:ns) <- get
   put ns
   a <- f n
   ns' <- get
   put ((n+1) : ns')
   return a

And finally, we have arrived at a way to disentangle Jones and Gibbons’ algorithm. This easily generalizes from breadth-first numbering to breadth-first labeling, and like the original, it is capable of renumbering the Stern-Brocot tree. The key insights here are the use of the lazy state monad, and getting the type of fresh correct. Everything else is relatively straightforward, once this groundwork is laid.

It’s interesting to perform a post-mortem analysis of why coming up with this proved to be so difficult for me. I’ll admit that I spent a few weeks trying to decipher the operational characteristics of Jones and Gibbons’ labeling algorithm, and while I think I have a reasonable grasp on it, I’m still not completely comfortable with it. However, this monadic abstraction seems perfectly obvious in hindsight.

This contrasts starkly to my own understanding of Lloyd Allison’s queue: I was completely comfortable with the operational aspects of the algorithm, but the monadic abstraction was quite difficult to come to grips with. So my difficulties with Jones and Gibbons’ algorithm was an over-emphasis on the operational aspects of the algorithm, and too much focus on the continuation monad as part of the solution. Basically, I was hopeful that the same methodologies in my Monad Reader article would be directly useful, so much so that I had a case of tunnel vision.

While it is not obvious how the continuation monad might be applicable to this problem, continuations still did play a role in the solution: look at the type of fresh, it’s the same (a -> r) -> r pattern that the continuation monad uses. It seems to me that there is something deeper here, although I don’t know what that would be.

These examples might make a stronger case for the lazy state monad than the example in my previous post. While foldrByLevel is relatively easy to adapt to the continuation passing state monad, I don’t know how to do the same with either of these two examples.

2009 December 20

Are continuations really the mother of all monads?

Filed under: continuations, haskell, monads — lpsmith @ 8:34 am

So Issue 14 of the Monad Reader is out, which includes my paper “Lloyd Allison’s Corecursive Queues” with a few significant revisions. Compared to earlier versions mentioned on this blog, I moved a few sections around to improve the flow: the first 8 pages now contain an uninterrupted informal derivation of the queue operators. I also demonstrated a new fixpoint operator on continuations, an implementation of the standard mfix fixpoint on the codensity monad, and argued that mapCont cannot be implemented in terms of callCC.

However, this post focuses on the section that I moved, entitled “Monad Transformers” from pages 51 to 56. It’s basically a short rant about why I don’t like monad transformers, an argument that later culminates in a mostly broken corecursive queue transformer. In retrospect, it’s somewhat unfortunate that I did not reread the classic paper Monad Transformers and Modular Interpreters sometime before Issue 14 came out. I did read that paper approximately nine or ten years ago. Although the paper was helpful in understanding how monads could be shaped to my own ends, now that I actually understand the contents of the paper, it feels rather crufty.

Section 8.1 defines correctness criteria for a monad transformer and associated lifted operations, which I quote as follows:

The basic requirement of lifting is that any program which does not use the added features should behave in the same way after a monad transformer is applied.

The thrust of my argument is that this requirement is indeed very basic; one would hope that certain properties useful for reasoning inside a given monadic language would also be preserved. This additional requirement seems rather hopeless. However, the pièce de résistance of the argument is that the continuation transformer is incorrect by their own criteria, at least in the context of a lazy language such as Haskell or Gofer.

I demonstrate an expression that depends on the laziness of the lazy state monad, and fails to terminate after a continuation transformer is applied. (As an aside, it doesn’t matter if this is a ContT r (StateT st Identity) monad or StateT st (ContT r Identity), they are the same monad with the same operations.) In retrospect, this seems obvious: something written in the continuation passing style specifies an evaluation order independent of the host language, and applying the continuation transformer corresponds to a call-by-value CPS transformation of part of the program.

The example involves a definition that computes a right fold over a level-order traversal of a tree:

foldrByLevel :: (MonadState (Queue a) m)
             => (a -> [a])
             -> (a -> b -> b) -> b -> [a] -> m b 
foldrByLevel childrenOf f b as = fold as 
    fold []     = deQ >>= maybe (return b)
                          (\a -> fold (childrenOf a))
    fold (a:as) = do 
                     enQ a 
                     b <- fold as
                     return (f a b)

If we use this definition to traverse an infinite tree, it will be productive when run with Control.Monad.State.Lazy, but will get stuck in an infinite non-productive loop when combined with Control.Monad.Cont. You can download a complete file that demonstrates this phenomenon. There are “simpler” expressions that demonstrate this, but the example I gave is itself interesting because it is useful in other contexts.

As demonstrated in my paper, the laziness can be restored by tweaking the definition of foldrByLevel to use mapCont. However, this is a leaky abstraction: in order to maintain the same behavior, we must modify existing code to make use of the “added” features in ways that are not backwards compatible. I do not know how to write a lazy state monad using continuations, or a sensible way of writing a single definition of foldrByLevel that behaves the same on both the lazy state monad and the continuation state monad.

(I use the word “sensible” because one could provide a unfaithful definition of mapCont for the lazy state monad that happens to work in the case of foldrByLevel, but fails in other contexts.)

What impact does this have on the notion that continuations are the mother of all monads? Is there a monad that corresponds to a call-by-name CPS transformation? Is it even possible to express the lazy state monad using continuations?

I do not yet have answers to these questions. One thing is clear, however: its tricky to generalize Filinski’s Representing Monads to lazy evaluation, if indeed it is possible to do so fully.

2009 June 22


Filed under: continuations, corecursion, haskell, monads, queues — lpsmith @ 8:41 pm

Haskell aficionados, take note! My library for corecursive queues has now been uploaded to Hackage. You can now cabal-install it.

I also have a substantially revised draft of the associated paper, Lloyd Allison’s Corecursive Queues, available. It has been re-organized so that it is hopefully easier to follow, it includes a careful performance comparison, and a tentative proof that mapCont cannot be expressed in terms of callCC, (>>=), return.

The library includes a somewhat crude micro-benchmarking program in the tests/ directory. Those who have read previous drafts, be warned that the few brief statements about performance were based on past notes, and I found some several issues with the testing methodology contained in the notes. Here the revised results:

Description Time (ms) -H500M Bytes allocated
GHC 6.10.3 mean σ mean σ per Branch
levelOrder’ 446 5 172 15 44.0
CorecQ 555 5 619 4 133.5
CorecQW _ 696 5 1128 6 213.6
CorecQW () 907 56 2235 11 213.6
Side Channel _ 959 3 1171 7 228.7
Side Channel () 1500 56 2171 7 276.4
STQ 1140 8 1087 14 371.2
TwoStack 1158 4 778 10 185.8
Okasaki 1553 7 1574 12 209.0
Data.Sequence 962 5 1308 5 348.1
GHC 6.8.3
levelOrder’ 461 2 173 15 44.1
CorecQ 458 4 267 13 67.5
CorecQW _ 526 5 713 5 141.2
CorecQW () 781 62 1775 62 141.3

These benchmarks come from performing breadth-first traversals repeatedly on the 34th fibonacci tree, on an Intel Core 2 Duo T9550. The first few data points were discarded, and the mean and standard deviation of the remaining times were computed. Note that getCPUTime was used to time each run, and this has a resolution of only 10 milliseconds.

If you would like to play with the queue transformer, which doesn’t appear in the library, or other bits of code exactly as they appear in the paper, you can download the source code here.

2009 May 3

Rocky’s Boots

Filed under: haskell — lpsmith @ 10:39 am

I ran across Logicly today, which is a Flash-based logic gate simulator. While it’s all very slick, I found it disappointing. It’s a mere shadow of Rocky’s Boots for the Commodore 64. The key differences are that Rocky’s Boots had D-flip flops, SR-latches, and simulated propogation delays, giving rise to glitches and hazards. While Logicly seems to handle self-reference somewhat well, enabling you to build your own flip flops, it doesn’t support any kind of abstraction so the overall support for sequential circuits is poor. (edit: Josh has added propagation delays, check it out!)

I could wax nostalgic about Rocky’s Boots; it was among my favorites when I was growing up, but I’ll spare you, dear readers. Truth be told, much like vintage BASIC, RB has a truly wretched user interface by modern standards, something that became painfully clear when I strolled through Rocky’s Boots a few years ago on a whim. I started into its sequel, Robot Odyssey, which I never got to play as a kid, but I didn’t get far due to the user interface.

Maybe Josh has plans for improving Logicly, I don’t know. Truth be told; it really is a travesty that Rocky’s Boots has not been remade and re-imagined many times over the last 27 years. At least it appears that Robot Odyssey has been remade a few times: here is a Java version that stays much too faithful to the original, down to replicating the same wretched interface. On the other hand, Gate seems reasonably promising… but it doesn’t appear to simulate propogation delay (then again, did Robot Odyssey?), nor does it have abstraction mechanisms like Robot Odyssey did.

So, if one was to remake Rocky’s Boots and/or Robot Odyessy, how would you do it? Might you use Functional Reactive Programming? How efficient would this be, and how large of a circuit could this handle? Could you make good use of modern SIMD and/or multiple cores? Although I daresay reaching children would be a particularly good goal for such a project, thus delivering it as Flash has definite appeal.

2009 April 7

Polynomial multiplication

Filed under: algorithms, corecursion, haskell — lpsmith @ 6:27 am

Alright, it’s been almost a month since I’ve last posted, and I wanted to post weekly. Perhaps bi-weekly would be a better goal, as long as I really can stick to it.

So, a few months ago, while working on a combinatorial problem for Project Euler, I was re-implementing polynomial arithmetic, and made an interesting observation. I was using a sparse list representation, with each element of the list being (M c k), representing the monomial cxk.

> data Mono = M !Integer !Integer  deriving (Eq, Ord, Show)

A polynomial is represented by a list of monomials, which is assumed to be sorted in ascending order of the power k above, and every power is assumed to be unique in our representation. Thus we can easily define addition and subtraction using standard techniques for dealing with ordered lists:

> mergeBy :: (Integer -> Integer -> Integer) -> [Mono] -> [Mono] -> [Mono]
> mergeBy f xs ys = loop xs ys
>   where 
>     loop [] ys = ys
>     loop xs [] = xs
>     loop (M c0 k0:xs) (M c1 k1:ys) 
>            = case compare k0 k1 of
>                LT -> M c0 k0 : loop xs (M c1 k1:ys)
>                EQ -> let c' = f c0 c1
>                       in if c' == 0 
>                           then loop xs ys
>                           else M c' k0 : loop xs ys
>                GT -> M c1 k1 : loop (M c0 k0:xs) ys

> add = mergeBy (+)
> sub = mergeBy (-)

It’s also very straightforward to multiply two monomials, and to perform a “scalar” multiplication of a single monomial with a polynomial. Note that the degree of each term is changed by a constant amount, so we don’t have to worry about re-ordering the list, and that the first case of smul is a completely optional minor optimization.

> mmul (M c0 k0) (M c1 k1) = M (c0 * c1) (k0 + k1)

> smul (M 1 0) ys = ys
> smul x ys = map (mmul x) ys

Taking into account the possibility of polynomials with an infinite number of terms, it’s now fairly easy to implement a polynomial multiplication program in terms of add and mmul:

> pmul _  []    = []
> pmul [] (_:_) = []
> pmul (x:xs) (y:ys) = mmul x y : add (smul x ys) (pmul xs (y:ys))

This is based around the algebraic identity (x + xs) * (y + ys) == x * y + x * ys + xs * (y + ys). It works on infinite polynomials because x and y are the terms of least degree to be multiplied, and every other product will have a higher degree. Unfortunately, this code turns out to be hopelessly inefficient.

The second thing I tried was simply changing the order in which arguments are passed to the recursive call to pmul:

> pmul2 _  []    = []
> pmul2 [] (_:_) = []
> pmul2 (x:xs) (y:ys) = mmul x y : add (smul x ys) (pmul2 (y:ys) xs)

Empirically, this is asymptotically faster than the first attempt. I do not know why. Any insight would be fun to read about!

2009 March 9

Lloyd Allison’s Corecursive Queues

Filed under: continuations, corecursion, haskell, monads, queues — lpsmith @ 9:34 pm

I’m proud to announce that a draft of the release of my paper, “Lloyd Allison’s Corecursive Queues: Why Continuations Matter“, is now available. (Source code available here, with a hackage package soon to come available here) Wouter Swierstra has informed me that he will publish it in the Monad Reader. However, it will appear in the next issue after the upcoming one, due to an unexpectedly large number of submissions this time around. Here is the abstract:

In a purely functional setting, real-time queues are traditionally thought to be much harder to implement than either real-time stacks or amortized O(1) queues. In “Circular Programs and Self-Referential Structures,” Lloyd Allison uses corecursion to implement a queue by defining a lazy list in terms of itself. This provides a simple, efficient, and attractive implementation of real-time queues.

While Allison’s queues are general, in the sense it is straightforward to adapt his technique to a new algorithm, a significant problem has been the lack of a reusable library implementation. This paper solves this problem through the use of a monadic interface and continuations.

Because Allison’s queues are not fully persistent, they cannot be first class values. Rather, they are encoded in particular algorithms written in an extended continuation passing style. In direct style, this extension corresponds to mapCont, a control operator found in Control.Monad.Cont, part of the Monad Template Library for Haskell. This paper conjectures that mapCont cannot be expressed in terms of callCC, return, and (>>=).

I intend to include a careful performance comparison before this becomes an official Monad Reader article. Allison’s queues come out very well; often better than two stack queues. I have conducted a careful performance comparison in the past, although with older versions of GHC, and older versions of my code. While I did take reasonably careful notes, things have changed. Haskell being what it is, figuring out why is often a challenge. In the meantime I am interested in feedback.

For fun, here is something I wrote right after I first came up with the basic idea behind the paper. It’s still the best error message I’ve gotten out of GHC. Kudos to whomever came up with that strategically placed bit of humor!

Thursday, August 25th, 2005, 5:22 am: Back to the Future

I’ve been up all night, but I now have a working fragment of computer code that is entirely too cute. It’s easily the cleverest bit I’ve written in years. I managed to implement… a queue.

Yes, a queue. One queue, not two. One purely functional queue, with one esoteric implementation! On my penultimate attempt, which was an utter failure except that it got me thinking in the right direction, I came by the most amusing error message I’ve seen to date out of GHC:

leon@deleon:~/Programs/snippets $  ghci -fglasgow-exts Queue.hs 
   ___         ___ _
  / _ \ /\  /\/ __(_)
 / /_\// /_/ / /  | |      GHC Interactive, version 6.2.2, for Haskell 98.
/ /_\\/ __  / /___| |
\____/\/ /_/\____/|_|      Type :? for help.

Loading package base ... linking ... done.
Compiling Queue            ( Queue.hs, interpreted )

    My brain just exploded.
    I can't handle pattern bindings for existentially-quantified constructors.


Failed, modules loaded: none.

Yeah, mine did too. Even I don’t fully understand my own code yet.

It should be noted that I knew full well that the code I was trying wouldn’t work… but after hours of bewilderment, not even trying to load anything into GHCi, for amusement’s sake I simply had to try something.

Update: (March 23)
— Data.Sequence is not a real time queue: rather, they are amortized.
— Added citation to Chris Okasaki’s Purely Functional Data Structures
— Other minor changes

Blog at