Melding Monads

2011 October 24

Concurrency and Foreign Functions in the Glasgow Haskell Compiler

Filed under: haskell — lpsmith @ 12:59 pm

I had an informative chat with Simon Marlow and Jules Bean a little while ago about concurrency in the Glasgow Haskell Compiler and its interaction with foreign code, specifically the difference between safe and unsafe calls. It turns out that the implementation is both simpler and more powerful than my vague misconceptions about how it worked. And since there seems to be quite a few myths out there, I thought I should write up an overview of GHC’s implementation.

Haskell’s concurrency model is relatively simple. Conceptually, Haskell threads are OS threads, and in most cases the programmer can treat them as such. This is known as a 1-1 threading model. In actuality, GHC’s implementation uses an M-N model, but provides an illusion of a 1-1 model. As one would expect, there are a few caveats to this illusion, but they tend to be minor. First, we’ll cover three fundamental components of GHC’s concurrency:

Haskell Threads

Haskell threads are implemented in GHC via cooperative multitasking. They are scheduled by GHC’s run-time system, making them the M of the M-N model. Yield points are generated automatically by the compiler, which provides an illusion of preemptive multitasking to the programmer. Also, the stack for each thread starts small but grows as needed. Cooperative multitasking and growable stacks make Haskell threads cheap and efficient, typically scaling to millions of threads on contemporary computer systems.

The downside is that GHC’s threads cannot by themselves take advantage of multiple CPUs. Also, foreign functions don’t cooperatively multitask. Notably, since GHC implements its arbitrary-precision Integer datatype via the GNU Multiple Precision Library, the illusion of preemptive multitasking can be less than perfect when executing Integer-heavy code.

To make use of Haskell threads, all you have to do is create them by calling Control.Concurrent.forkIO. There are no special compile-time, link-time, or run-time options needed to enable them.

Operating System Threads

OS threads offer true preemptive multitasking and are scheduled by the kernel. They are also more expensive, typically scaling to thousands of threads on contemporary systems. However, they can execute on multiple CPUs, and they provide a way to deal with foreign calls that either block or take a long time to execute.

To use OS threads, all you do need to link your executable with GHC’s threaded runtime system using the -threaded option. There is nothing particularly special you need to do inside Haskell code to utilize operating system threads; GHC’s threaded runtime will endeavor to maintain the illusion that Haskell threads are OS threads as best it can.

Capabilities

In the context of GHC, a capability is an operating system thread that is able to execute Haskell code. GHC schedules Haskell threads among the capabilities, making them the N of the M-N model. Previous versions of GHC only supported one capability, but GHC now supports as many capabilities as you want. While the total number of capabilities remains fixed, the collection of OS threads that are capabilities changes over time.

You can set the number of capabilities “x” by using the -N[x] run-time option. This can be done either by passing the executable +RTS -N[x] on the command line, or by setting the default RTS options by linking the executable with -with-rtsopts="-N[x]". Note that you will need the -threaded and may need the -rtsopts link-time options.

The Foreign Function Interface

The FFI supports two kinds of calls: safe and unsafe. A “safe” call blocks the calling OS thread and Haskell thread until the call finishes. Blocking these is unavoidable. However, a safe call does not block the capability. When GHC performs a safe call, it performs the call inside the current OS thread, which is a capability. The capability then moves to another OS thread. If no other threads are available, GHC will transparently create a new OS thread. OS threads are pooled to try to avoid creating or destroying them too often.

An unsafe call blocks the capability in addition to blocking the OS and Haskell threads. This was a pretty big deal when GHC only supported a single capability. In effect, an unsafe call would block not only the Haskell thread that made the call, but every Haskell thread. This gives rise to the myth that unsafe calls block the entire Haskell runtime, which is no longer true. An unsafe foreign call that blocks is still undesirable, but depending on configuration, may not be as bad as it used to be.

The advantage of unsafe calls is that they entail significantly less overhead. Semantically, a foreign function that might call back into Haskell must be declared as a “safe” call, whereas foreign function that will never call back into Haskell may be declared as “unsafe”. As a result, safe calls must perform extra bookkeeping.

(As an aside, this vocabulary is not particularly consistent with Haskell’s other use of “safe” and “unsafe”. In this other sense, foreign calls are unsafe because they can be used to break the abstractions that GHC provides.)

In practice, most foreign functions will never call back into Haskell. Thus choosing whether to mark a foreign function as safe or unsafe usually revolves around the trade-off between concurrency and overhead. If needed, you can declare both a safe and an unsafe binding to the same foreign function.

Bound threads

Finally, the most significant caveat to GHC’s 1-1 threading illusion is that some foreign code (notably OpenGL) expects to be run in a single OS thread, due to the use of thread-local storage or the like.

GHC’s answer is “bound threads”. GHC provides forkOS, which creates a Haskell thread with a special property: every foreign call that the Haskell thread makes is guaranteed to happen inside the same OS thread. It is perhaps a bad choice of name, as there is no need to call forkOS to utilize operating system threads. The only time you need to call forkOS is when you need an honest-to-goodness 1-1 correspondence between a Haskell thread and a kernel thread.

Finishing up

These kinds of illusions are a common theme in GHC: what appears to be synchronous blocking IO is in fact asynchronous non-blocking IO, what appears to be preemptive multitasking is in fact cooperative multitasking, what appears to be 1-1 threading is actually M-N threading. GHC endeavors to present the user with a nice interface, while using more efficient techniques behind the scenes. And, for the most part, these illusions hold up pretty well.

I hope the community finds this overview helpful. Most of what I’ve talked about is covered in “Extending the Haskell Foreign Function Interface with Concurrency” by Simon Marlow, Simon Peyton Jones, and Wolfgang Thaller. I hope this post is a useful introduction to the paper.

About these ads

9 Comments »

  1. Just to make it clearer (for anyone reading) what blocking a capability means: it means *all* Haskell threads scheduled on that capability are blocked for the duration of the call, not just the one making the foreign call. If the function completes quickly (or if there aren’t any other Haskell threads scheduled on the capability, e.g. if M isn’t (much) larger than N), it won’t matter. Otherwise, it might.

    This happens because unsafe calls are basically just direct C function calls. Safe calls are that plus various book-keeping and thread juggling.

    A difficulty is that how long a call takes frequently depends on its what its arguments are (or on other external state). memcpy could copy a byte, or a gigabyte. Is there any good solution to this other than importing both a safe and an unsafe version and leaving the choice to the client? If you use the function internally, you could end up needing to provide two versions of every function which uses it (per each such foreign import), so this doesn’t seem really tractable in the general case… I guess the only safe thing to do is to leave some performance on the table and always make safe calls when in doubt, and to only make unsafe ones when you can be sure they won’t block for too long.

    Remi Turk was kind enough to do some benchmarks of safe calls vs. unsafe calls vs. his cinvoke library bindings a while back, with both the threaded and non-threaded runtimes:

    http://www.haskell.org/pipermail/haskell-cafe/2011-March/090029.html
    http://www.haskell.org/pipermail/haskell-cafe/2011-March/090083.html

    The overhead of safe calls seems to be quite low for (presumably most) non-synthetic use cases.

    Comment by Gábor Lehel — 2011 October 24 @ 5:06 pm

    • Hmm, I didn’t really consider the consequences of blocking a capability, so I wasn’t asking the right questions in that regard. I just sort of tacitly assumed that it wouldn’t block any other Haskell threads, assuming there was another capability that isn’t blocked. But that isn’t necessarily true, depending on how GHC’s scheduler is implemented.

      That issue sounds like another post that is worth writing. Thanks!

      I too have benchmarked safe versus unsafe calls. I used a safe and unsafe binding to a C function that added the two parameters it was passed, and used criterion. There may well be subtleties that this benchmark didn’t reveal, but that suggested the overhead to be on the order of 7 nanoseconds for an unsafe call or 100 nanoseconds for an safe call on my systems. So yes, if the call is going to take more than about a microsecond, then the overhead isn’t that big of a deal… but I’m not sure what the scheduling granularity of Haskell threads is supposed to approximate, either. I believe that’s not time based, but rather a counter that counts down as you pass through yield points, so I believe the granularity can depend on what you are doing.

      Regarding memcpy, you can determine how much memory it’s going to copy based on the parameters, so you could have a simple Haskell wrapper that executes either a safe or unsafe call accordingly. You could even have GHC inline the wrapper and get rid of the branch in the case of constants.

      Comment by lpsmith — 2011 October 25 @ 7:41 pm

      • As far as I understand it, moving the other Haskell threads from the capability over to a different one is precisely what a safe call does, and is part of the overhead which unsafe calls avoid. (Or rather, I think it’s the thread making the call which gets moved over and the others stay, but that sounds mostly isomorphic.)

        Yield points are whenever allocation happens, but I don’t know anything about it beyond that.

        Regarding memcpy, that’s such an obvious solution that I’m somewhat embarrassed I didn’t think of it. Thanks. :)

        Comment by Gábor Lehel — 2011 October 25 @ 9:07 am

        • I don’t think that is correct, but you do bring up an excellent question… how does the scheduler work, and what are the consequences of blocking a capability? Of course, I’m pretty sure the answer depends on particular versions of GHC, and is liable to change again in the future.

          My understanding is that the entire capability moves to another OS thread, and I don’t see why that would entail much overhead. In particular, you don’t have to move individual Haskell threads. If that were true, then the cost of a safe call would be linear in the number of Haskell threads, which would be unacceptable. You do have to save more state though, because (among other things?) you have to ensure that the garbage collector will find everything if the foreign code calls back into Haskell and triggers a GC.

          Comment by lpsmith — 2011 October 26 @ 4:28 am

          • >You do have to save more state though, because (among other things?) you have to ensure that the garbage collector will find everything if the foreign code calls back into Haskell and triggers a GC.

            AFAIK it’s really very very simple: a capability is just a glorified mutex lock. When you perform a safe FFI call, the Haskell code marshals Int’s, etc. into the HsInt etc. C equivalents, then releases the capability (the mutex lock). If there are no other worker threads, it launches another worker thread. Then it performs the FFI call.

            Any return to Haskell code (either via the FFI C-to-Haskell, or a return from the safe FFI call) then attempts to lock the capability (mutex). The OS thread blocks until the capability (mutex) is unlocked by whoever got it.

            Comment by almkglor — 2011 October 26 @ 4:56 am

    • Ok, I have confirmed that that you are absolutely correct in that Haskell threads are assigned a capability, and that those threads will not be rescheduled on a different capability during an unsafe FFI call. So thus, in effect, an unsafe call blocks all the Haskell threads on the capaibility that it is blocking. Thanks for the clarification!

      Comment by lpsmith — 2012 June 26 @ 9:13 am

  2. Do you know, when GHC started to support more than one capability? It’s kind of hard to track that down manually.

    Comment by Sönke Hahn — 2011 October 26 @ 2:26 pm

    • Judging from old GHC User Guides, it looks like it was introduced in GHC 6.6 circa 2007. It looks like GHC 6.4 had a different kind of support for parallelism, though it’s no longer supported and I don’t know much about it.

      Comment by lpsmith — 2011 October 26 @ 7:37 pm


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Shocking Blue Green Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: