## 2010 July 21

### Iteratees and a (very) Minimal Model of Shared-State Concurrency

In May, I gave a brief presentation to the Chicago Haskell User’s Group on modelling shared-state concurrency in a purely functional way without using the IO or STM monads. You can download a copy of the slides and Haskell source files. The slides may be of particular interest to those who are baffled by the continuation monad but are comfortable with the state monad.

The problem motivating the presentation is the Thread Game, attributed to J Strother Moore: given two threads each running the `doubler` pseudo-code below, with local variables `x` and `y` and a single shared global variable `a` initialized to 1, what values can appear in the global variable? Can you prove your answer?

```doubler = loop
x := a
y := a
a := x + y
end
```

It should be noted that no synchronization is used, that each line represents an atomic instruction, and that arbitrary schedules are allowed. (e.g. they need not be fair schedules)

The talk covered two implementations of a function from finite schedules to the final value of `a`, providing a model of the Thread Game. The first implementation used a relatively straightforward technique reminiscent of automata and Turing Machines from theoretical computer science. The local configuration for a thread consists of the local variables plus an instruction pointer. The delta function `step_doubler` takes a local configuration and the current value of the global variable, and produces a new local configuration and value for the global variable.

```type IP     = Int       -- Instruction Pointer
type Local  = Integer   -- Local Variable
type Global = Integer   -- Global Variable

type LocalConf  = (IP, Local, Local)

step_doubler :: (LocalConf,Global) → (LocalConf,Global)
step_doubler ( (ip,x,y) , a )
= case ip of
0 → ( (1,a,y) , a   )
1 → ( (2,x,a) , a   )
2 → ( (0,x,y) , x+y )```

The global configuration consists of the value of the global variable, and one local configuration for each `doubler` process. The global step function takes the global configuration and a process identifier, and produces another global configuration.

```type PID        = Int    -- Process Identifier
type GlobalConf = (LocalConf, LocalConf, Global)

step_global :: GlobalConf → PID → GlobalConf
step_global (d0, d1, a) pid
= case pid of
0 → let (d', a') = step_doubler (d0, a) in (d', d1, a')
1 → let (d', a') = step_doubler (d1, a) in (d0, d', a')```

The function from schedules to the final value in `a` is just a left fold, resulting in the first model of the thread game:

```threadGame :: [PID] → Integer
where
( _, _, a ) = foldl step_global ( (0,0,0), (0,0,0), 1 ) pids```

While this technique is generic and relatively straightforward, the code itself is not generic, and is fairly far removed from the original specification. It would seem natural to model `doubler` using the state monad as shown below. The state monad can handls the local variables and the global variable, but cannot represent the instruction pointer or offer the co-routine facility needed to model the given problem.

```doubler = do
x ← get
y ← get
put (x+y)
doubler```

The continuation monad can represent the instruction pointer, but because Haskell is a pure language, further tricks are needed to mediate between the thread scheduler and each doubler process.

## Iteratee-based Shared-State Concurrency

The second implementation used an idea derived from `Iterators` as shown to me by Roshan James. These are a more generic variant of to Oleg’s Iteratees. Iteratees are something of a hot topic these days, being a core feature of the Snap Framework and the subject of assorted blog posts.

```data Iterator i o r
= Done r
| Result o (i → Iterator i o r)```

The idea here is that an iterator is either done, and returns a final result (of type `r`), or it might want to produce more, so it returns an intermediate result (of type `o`) along with a continuation to be called if and when further computation is required. Interestingly, Stephen Blackheath also used an equivalent structure in today’s post, Co-routines in Haskell.

Basically, each thread is an Iteratee that accepts the value in the global variable as an input, and produces a new value for the global value and it’s own continuation as output. This continuation represents the local configuration, which consists of the instruction pointer and the two local variables. The thread scheduler manages these continuations and feeds the global variable into the corresponding continuation.

I had (almost) invented iteratees myself a number of years ago, although I didn’t realize the connection until after I had begun to write this blog post. Back in the year 2000, I helped write a paper about implementing finite automata and Turing machines in functional languages by directly using the standard textbook definition. In the process I stumbled on an alternate representation of a DFA that I did not publish, mentioned briefly here.

```data DFA st ab = DFA (st → ab → st) st (st → Bool)

data ContDFA ab = ContDFA Bool (ab → ContDFA ab)
```

The first representation is the standard textbook definition, a automaton consists of a set of states `st`, an alphabet `ab`, and a transition function, a start state, and (the characteristic function of) a set of final states. Interestingly, this representation was revisited in the most recent issue of the Monad Reader.

The second representation is a specialized case of `Iterator` above, without the `Done` alternative. Basically, the `ContDFA` represents a single state in the machine, which has a boolean field saying whether or not the state is final, and a method that accepts a letter and returns another state.

When I was first exposed to Iterators, I was confused by them and the way my friend was using them. Funnily enough, I completely missed the connection to my `ContDFA` invention, even though I was immediately reminded of Mealy Machines and thought of that project! In a way, it was a fortuitous slip of the mind. While thinking about implementing Mealy Machines with continuations in an effort to understand Roshan’s work, I came up with “functional iterators”. (If anybody can suggest a better name, please speak up!)

```newtype FunIter i o r
= FunIter (FunIter o i r → i → r)```

Compared to `Iterator`, `FunIter` assumes that the consumer of the intermediate values is itself written in the Continuation Passing Style. I don’t understand the exact relationship between `Iter` and `Iterator`; they are remarkably similar, but there are trade-offs between what these two structures can represent, trade-offs I don’t understand very well.

Both `Iterator` and `FunIter` are capable of expressing the coroutine facility needed to model the thread game, however, my `Iterator` solution has a few vestigial features, and I like my `FunIter` solution a bit better.

```newtype FunIter i o r
= FunIter {runIter :: FunIter o i r → i → r}

type Thread r st a = Cont (FunIter st st r) a

runThread m = runCont m (\r → FunIter (\_ _ → r))```

I used a continuation monad to capture the local continuations, while `FunIter` mediates the transition between the local continuations and the scheduler continuation. We can then define `get` and `put`, which yield to the scheduler continuation:

```get    = Cont (\k → FunIter (\(FunIter sk) st →  sk (k st) st))
put st = Cont (\k → FunIter (\(FunIter sk) _  →  sk (k ()) st))```

Here is a more generic thread scheduler:

```update :: Int → a → [a] → [a]
update n x [] = []
update n x (y:ys)
| n <= 0    =  x : ys
| otherwise =  y : update (n-1) x ys

runSchedule :: [FunIter st st st] → [PID] → st → st
runSchedule threads []          st = st
where
sched = FunIter \$ \iter st' →
runSchedule (update pid iter threads) pids st'```

And finally, we can bring everything together into an alternate model of the thread game:

```threadGame' :: [PID] → Integer

I don’t know who first invented Iteratees, but the idea has been lurking in a variety of different contexts for some time. The functional iterators presented here might be original. The second solution is a bit longer and more esoteric, but it’s more generic and should scale better in terms of source code and maintenance.

## 2010 March 14

I have four draft posts started that I don’t have finished. But I won’t be finishing any of them today, instead I’ll issue an erratum for my paper in Issue 14 of the Monad Reader. On page 53 I wrote:

Writer monads partially enforce the efficient use of list concatenation. In the MTL, `Writer [e] a` is just a newtype isomorphism for `([e], a)`. It provides a function `tell` that takes a list and concatenates the remainder of the computation onto the end of the list. This naturally associates to the right, and thus avoids quadratic behavior. Of course, tell accepts arbitrary length lists, and one could inefficiently produce a long argument to tell.

The writer monad does no such thing. In fact, it’s quite easy to write a function using the writer monad that exhibits quadratic behavior. For example here is a function that uses the writer monad to produce the list `[1..n]` in quadratic time:

```enumTo n
| n <= 0    = return ()
| otherwise = enumTo (n-1) >> tell [n]
```

By unfolding the definitions of return and (>>=) for Control.Monad.Writer, and the Monoid instance for List, we can find this is equivalent to:

```enumToWriter n
| n <= 0    = ((), [])
| otherwise = let (a,w ) = enumToWriter (n-1)
(b,w') = (\_ -> ((),[n])) ()
in (b,w ++ w')
```

By beta-reducing the let bindings and making use of snd, we get

```enumToWriter'2 n
| n <= 0    = ((), [])
| otherwise = ((), snd (enumToWriter'2 (n-1)) ++ [n])
```

And finally, we can hoist the pairs out of the loop, if we want:

```enumToWriter'3 n = ((), loop n)
where
loop n
| n <= 0    = []
| otherwise = loop (n-1) ++ [n]
```

It should now be clear that this is quadratic, as the first element of the result gets copied (n-1) times, the second element gets copied (n-2) times, and so on. However, there is a solution. By adjusting the monad, the original definition can be executed in linear time. My erroneous quote is actually an accurate description the continuation-based writer monad.

```
instance (Monoid w) => MonadWriter w (Cont w) where
tell xs = Cont (\k -> xs `mappend` k ())
-- or,  in a direct style:
-- tell xs = mapCont (xs `mappend`) (return ())
```

The expression `runCont (enumTo n) (const [])` produces the list `[1..n]` in linear time. The continuation-based writer monad provides the linear-time guarantee that I described but erroneously attributed to the regular writer monad. By unfolding the definition of return and (>>=) for Control.Monad.Cont and our definition for `tell` above, we can find out what this algorithm actually does:

```-- unfold return and (>>=)
enumToCont n
| n <= 0    = k ()
| otherwise = \k -> enumToCont (n-1) (\a  -> (\_ -> tell [n]) a k)
-- beta-reduction
--            = \k -> enumToCont (n-1) (\_a -> tell [n] k)
-- unfold tell
--            = \k -> enumToCont (n-1) (\_  -> (\k -> [n] ++ k ()) k)
-- beta-reduction
--            = \k -> enumToCont (n-1) (\_  -> [n] ++ k ())
-- unfold (++)
--            = \k -> enumToCont (n-1) (\_  -> n : k ())
```

And, if we like, we can avoid passing () to the continuation argument, resulting in a normal accumulator argument:

```enumToCont' n k
| n <= 0    = k
| otherwise = enumToCont' (n-1) (n:k)
```

By writing it with `(:)`, we know it has to associate to the right and thus operates in linear time. In the quadratic-time solution, the `(++)` cannot be unfolded. However, this solution isn’t incremental: trying to evaluate `take 5 (enumToCont' (10^12) [])` will more than likely run your computer completely out of virtual memory long before it finishes. It’s not to difficult to see why: `enumFromCont'` produces a list element equal to it’s input, and it has to count down from a trillion before it produces `[1,2,3,4,5, ...]`. So it really is better to write it as:

```enumTo' lim = loop 1
where
loop n
| n <= lim  = tell [n] >> loop (n+1)
| otherwise = return ()
```

This definition is both linear and incremental under both the continuation-based writer and the normal writer monad, but it isn’t a generic transformation that applies to any algorithm using the writer monad.

These two writer monads have a number tradeoffs; for example, the continuation-based writer monad cannot be used with the corecursive queue transformer in the paper. An efficient alternative is the regular writer monad with a list representation that offers efficient concatenation, such as difference lists.

For further reading, you might enjoy Asymptotic Improvement of Computations over Free Monads by Janis Voigtländer, which explores this phenomenon further.

## 2009 December 30

### Fun with the Lazy State Monad

The lazy state monad doesn’t work the way you think it works. Thinking of the lazy state monad in terms of imperative programming is a very useful first approximation, at least in terms of what is computed, but in terms how things are computed this intuition is beyond useless. I hope to dissuade you of the latter part of the intuition in this post, by demonstrating two of the more interesting things that can be done with the lazy state monad.

Albert Y.C. Lai recently shared a program that demonstrates the lazy state monad in a rather interesting way:

```pro :: State [Bool] ()
pro = do
pro
s <- get
put (True : s)
```

Instead of tail recursion, this employs head recursion. The question is, what does `pro` return when you run it, and why? I found it easy to guess the correct answer, but my reasoning was completely wrong. Of course, if viewed through a wholly imperative mindset, this leads to non-termination, but the lazy state monad extracts usable information from this definition.

In my recent Monad Reader article, I disentangled the bits of circular programming from Lloyd Allison’s queue, and for the last year I have been obsessed with pulling the same trick with Jones and Gibbons’ breadth-first labeling algorithm. At times, this obsession has been a bit of a disease. It’s also worth pointing out Chris Okasaki’s discussion of a particular instance of this algorithm, and the mention of this algorithm on the FP Lunch Weblog. Here is code for the special case of breadth-first numbering:

```data Tree a b = Leaf a | Branch b (Tree a b) (Tree a b)

lazyRenum :: Num n => Tree a b -> Tree n n
lazyRenum t = t'
where
(ns, t') = renumber (0:ns, t)

renumber (n:ns,  Leaf    _    ) = (n+1:ns  , Leaf    n      )
renumber (n:ns,  Branch  _ l r) = (n+1:ns'', Branch  n l' r')
where
(ns' , l')  = renumber (ns , l)
(ns'', r')  = renumber (ns', r)
```

I finally disentangled the corecursive bits from this example today. The circular programming occurs in the list argument, not the tree. Note that the flow of the list `ns` from one call of `renumber` to the next is like the state monad. From this observation, I wrote the following whimsical code:

```lazyRenum :: Num n => Tree a b -> Tree n n
lazyRenum = runFresh . renumber
where
renumber (Leaf    _    )
= fresh (\n -> do
return (Leaf n))
renumber (Branch  _ l r)
= fresh (\n -> do
l' <- renumber l
r' <- renumber r
return (Branch n l' r'))
```

Once I had this right, it was pretty easy to fill in the definitions for `fresh` and `runFresh`, by cribbing off Chris Okasaki’s simplification of Jones and Gibbons’ algorithm:

```type Fresh n a = State [n] a

runFresh :: Num n => Fresh n a -> a
runFresh m = a
where
(a, ns) = runState m (0:ns)

fresh :: Num n => (n -> Fresh n a) -> Fresh n a
fresh f = do
(n:ns) <- get
put ns
a <- f n
ns' <- get
put ((n+1) : ns')
return a
```

And finally, we have arrived at a way to disentangle Jones and Gibbons’ algorithm. This easily generalizes from breadth-first numbering to breadth-first labeling, and like the original, it is capable of renumbering the Stern-Brocot tree. The key insights here are the use of the lazy state monad, and getting the type of `fresh` correct. Everything else is relatively straightforward, once this groundwork is laid.

It’s interesting to perform a post-mortem analysis of why coming up with this proved to be so difficult for me. I’ll admit that I spent a few weeks trying to decipher the operational characteristics of Jones and Gibbons’ labeling algorithm, and while I think I have a reasonable grasp on it, I’m still not completely comfortable with it. However, this monadic abstraction seems perfectly obvious in hindsight.

This contrasts starkly to my own understanding of Lloyd Allison’s queue: I was completely comfortable with the operational aspects of the algorithm, but the monadic abstraction was quite difficult to come to grips with. So my difficulties with Jones and Gibbons’ algorithm was an over-emphasis on the operational aspects of the algorithm, and too much focus on the continuation monad as part of the solution. Basically, I was hopeful that the same methodologies in my Monad Reader article would be directly useful, so much so that I had a case of tunnel vision.

While it is not obvious how the continuation monad might be applicable to this problem, continuations still did play a role in the solution: look at the type of `fresh`, it’s the same `(a -> r) -> r` pattern that the continuation monad uses. It seems to me that there is something deeper here, although I don’t know what that would be.

These examples might make a stronger case for the lazy state monad than the example in my previous post. While `foldrByLevel` is relatively easy to adapt to the continuation passing state monad, I don’t know how to do the same with either of these two examples.

## 2009 December 20

### Are continuations really the mother of all monads?

So Issue 14 of the Monad Reader is out, which includes my paper “Lloyd Allison’s Corecursive Queues” with a few significant revisions. Compared to earlier versions mentioned on this blog, I moved a few sections around to improve the flow: the first 8 pages now contain an uninterrupted informal derivation of the queue operators. I also demonstrated a new fixpoint operator on continuations, an implementation of the standard `mfix` fixpoint on the codensity monad, and argued that `mapCont` cannot be implemented in terms of `callCC`.

However, this post focuses on the section that I moved, entitled “Monad Transformers” from pages 51 to 56. It’s basically a short rant about why I don’t like monad transformers, an argument that later culminates in a mostly broken corecursive queue transformer. In retrospect, it’s somewhat unfortunate that I did not reread the classic paper Monad Transformers and Modular Interpreters sometime before Issue 14 came out. I did read that paper approximately nine or ten years ago. Although the paper was helpful in understanding how monads could be shaped to my own ends, now that I actually understand the contents of the paper, it feels rather crufty.

Section 8.1 defines correctness criteria for a monad transformer and associated lifted operations, which I quote as follows:

The basic requirement of lifting is that any program which does not use the added features should behave in the same way after a monad transformer is applied.

The thrust of my argument is that this requirement is indeed very basic; one would hope that certain properties useful for reasoning inside a given monadic language would also be preserved. This additional requirement seems rather hopeless. However, the pièce de résistance of the argument is that the continuation transformer is incorrect by their own criteria, at least in the context of a lazy language such as Haskell or Gofer.

I demonstrate an expression that depends on the laziness of the lazy state monad, and fails to terminate after a continuation transformer is applied. (As an aside, it doesn’t matter if this is a `ContT r (StateT st Identity)` monad or `StateT st (ContT r Identity)`, they are the same monad with the same operations.) In retrospect, this seems obvious: something written in the continuation passing style specifies an evaluation order independent of the host language, and applying the continuation transformer corresponds to a call-by-value CPS transformation of part of the program.

The example involves a definition that computes a right fold over a level-order traversal of a tree:

```foldrByLevel :: (MonadState (Queue a) m)
=> (a -> [a])
-> (a -> b -> b) -> b -> [a] -> m b
foldrByLevel childrenOf f b as = fold as
where
fold []     = deQ >>= maybe (return b)
(\a -> fold (childrenOf a))
fold (a:as) = do
enQ a
b <- fold as
return (f a b)
```

If we use this definition to traverse an infinite tree, it will be productive when run with `Control.Monad.State.Lazy`, but will get stuck in an infinite non-productive loop when combined with `Control.Monad.Cont`. You can download a complete file that demonstrates this phenomenon. There are “simpler” expressions that demonstrate this, but the example I gave is itself interesting because it is useful in other contexts.

As demonstrated in my paper, the laziness can be restored by tweaking the definition of `foldrByLevel` to use `mapCont`. However, this is a leaky abstraction: in order to maintain the same behavior, we must modify existing code to make use of the “added” features in ways that are not backwards compatible. I do not know how to write a lazy state monad using continuations, or a sensible way of writing a single definition of `foldrByLevel` that behaves the same on both the lazy state monad and the continuation state monad.

(I use the word “sensible” because one could provide a unfaithful definition of `mapCont` for the lazy state monad that happens to work in the case of `foldrByLevel`, but fails in other contexts.)

What impact does this have on the notion that continuations are the mother of all monads? Is there a monad that corresponds to a call-by-name CPS transformation? Is it even possible to express the lazy state monad using continuations?

I do not yet have answers to these questions. One thing is clear, however: its tricky to generalize Filinski’s Representing Monads to lazy evaluation, if indeed it is possible to do so fully.

## 2009 June 22

Filed under: continuations, corecursion, haskell, monads, queues — lpsmith @ 8:41 pm

Haskell aficionados, take note! My library for corecursive queues has now been uploaded to Hackage. You can now cabal-install it.

I also have a substantially revised draft of the associated paper, Lloyd Allison’s Corecursive Queues, available. It has been re-organized so that it is hopefully easier to follow, it includes a careful performance comparison, and a tentative proof that `mapCont` cannot be expressed in terms of `callCC`, `(>>=)`, `return`.

The library includes a somewhat crude micro-benchmarking program in the `tests/` directory. Those who have read previous drafts, be warned that the few brief statements about performance were based on past notes, and I found some several issues with the testing methodology contained in the notes. Here the revised results:

Description Time (ms) -H500M Bytes allocated
GHC 6.10.3 mean σ mean σ per Branch
levelOrder’ 446 5 172 15 44.0
CorecQ 555 5 619 4 133.5
CorecQW _ 696 5 1128 6 213.6
CorecQW () 907 56 2235 11 213.6
Side Channel _ 959 3 1171 7 228.7
Side Channel () 1500 56 2171 7 276.4
STQ 1140 8 1087 14 371.2
TwoStack 1158 4 778 10 185.8
Okasaki 1553 7 1574 12 209.0
Data.Sequence 962 5 1308 5 348.1
GHC 6.8.3
levelOrder’ 461 2 173 15 44.1
CorecQ 458 4 267 13 67.5
CorecQW _ 526 5 713 5 141.2
CorecQW () 781 62 1775 62 141.3

These benchmarks come from performing breadth-first traversals repeatedly on the 34th fibonacci tree, on an Intel Core 2 Duo T9550. The first few data points were discarded, and the mean and standard deviation of the remaining times were computed. Note that `getCPUTime` was used to time each run, and this has a resolution of only 10 milliseconds.

If you would like to play with the queue transformer, which doesn’t appear in the library, or other bits of code exactly as they appear in the paper, you can download the source code here.

## 2009 March 9

### Lloyd Allison’s Corecursive Queues

Filed under: continuations, corecursion, haskell, monads, queues — lpsmith @ 9:34 pm

I’m proud to announce that a draft of the release of my paper, “Lloyd Allison’s Corecursive Queues: Why Continuations Matter“, is now available. (Source code available here, with a hackage package soon to come available here) Wouter Swierstra has informed me that he will publish it in the Monad Reader. However, it will appear in the next issue after the upcoming one, due to an unexpectedly large number of submissions this time around. Here is the abstract:

In a purely functional setting, real-time queues are traditionally thought to be much harder to implement than either real-time stacks or amortized O(1) queues. In “Circular Programs and Self-Referential Structures,” Lloyd Allison uses corecursion to implement a queue by defining a lazy list in terms of itself. This provides a simple, efficient, and attractive implementation of real-time queues.

While Allison’s queues are general, in the sense it is straightforward to adapt his technique to a new algorithm, a significant problem has been the lack of a reusable library implementation. This paper solves this problem through the use of a monadic interface and continuations.

Because Allison’s queues are not fully persistent, they cannot be first class values. Rather, they are encoded in particular algorithms written in an extended continuation passing style. In direct style, this extension corresponds to `mapCont`, a control operator found in `Control.Monad.Cont`, part of the Monad Template Library for Haskell. This paper conjectures that `mapCont` cannot be expressed in terms of `callCC`, `return`, and `(>>=)`.

I intend to include a careful performance comparison before this becomes an official Monad Reader article. Allison’s queues come out very well; often better than two stack queues. I have conducted a careful performance comparison in the past, although with older versions of GHC, and older versions of my code. While I did take reasonably careful notes, things have changed. Haskell being what it is, figuring out why is often a challenge. In the meantime I am interested in feedback.

For fun, here is something I wrote right after I first came up with the basic idea behind the paper. It’s still the best error message I’ve gotten out of GHC. Kudos to whomever came up with that strategically placed bit of humor!

Thursday, August 25th, 2005, 5:22 am: Back to the Future

I’ve been up all night, but I now have a working fragment of computer code that is entirely too cute. It’s easily the cleverest bit I’ve written in years. I managed to implement… a queue.

Yes, a queue. One queue, not two. One purely functional queue, with one esoteric implementation! On my penultimate attempt, which was an utter failure except that it got me thinking in the right direction, I came by the most amusing error message I’ve seen to date out of GHC:

```leon@deleon:~/Programs/snippets \$  ghci -fglasgow-exts Queue.hs
___         ___ _
/ _ \ /\  /\/ __(_)
/ /_\// /_/ / /  | |      GHC Interactive, version 6.2.2, for Haskell 98.
/ /_\\/ __  / /___| |      http://www.haskell.org/ghc/
\____/\/ /_/\____/|_|      Type :? for help.

Compiling Queue            ( Queue.hs, interpreted )

Queue.hs:84:
My brain just exploded.
I can't handle pattern bindings for existentially-quantified constructors.

...