Melding Monads

2015 November 21

Announcing linux-inotify 0.3

Filed under: Uncategorized — lpsmith @ 12:34 am

linux-inotify is a thin binding to the Linux kernel’s file system notification functionality for the Glasgow Haskell Compiler. The latest version does not break the interface in any way, but the implementation eliminates use-after-close file descriptor faults, which are now manifested as IOExceptions instead. It also adds thread safety, and is much safer in the presence of asynchronous exceptions.

In short, the latest version is far more industrial grade. It is the first version I would not hesitate to recommend to others, and the first version I have talked much about publicly. The only downside is that it does require GHC 7.8 or later, due to the use of threadWaitReadSTM to implement blocking reads.

File Descriptor Allocation

These changes were motivated by lpsmith/postgresql-simple#117, one of the more interesting bugs I’ve been peripherally involved with. The upshot is that it’s very probably a use-after-close descriptor fault: a descriptor associated with a websocket would close, that descriptor would be associated with a new description that would be a database connection, and then a still-active websocket thread would ping the database connection, causing the database server to hang up due to a protocol error. Thus, a bug in snap and/or websockets-snap manifested itself in the logs of a database server.

This issue did end up exposing a significant deficiency in my understanding of the allocation of file descriptors: for historical reasons, the POSIX standard specifies that a new description must be indexed by the smallest unused descriptor.

This is an unfortunate design from several standpoints, as it necessitates more serialization than would otherwise be required, and a scheme that seeks to delay reusing descriptors as long as practical would greatly mitigate use-after-close faults, as they would almost certainly manifest as an EBADF errno instead of a potentially catastrophic interaction with an unintended description.

In any case, one of the consequences of the POSIX standard is that a use-after-close descriptor fault is a pretty big deal, especially in a server with a high descriptor churn rate. In this case, the chances of an inappropriate interaction with the wrong descriptor is actually pretty likely.

Thus, in the face of new evidence, I revised my opinion regarding what the linux-inotify binding should provide; in particular protection against use-after-close. And in the process I made the binding safer in the presence of concurrency and asynchronous exceptions.

The Implementation

Due to the delivery of multiple inotify messages in a single system call, linux-inotify provides a buffer behind the scenes. The new implementation has two MVar locks: one associated with the buffer, and another associated with the inotify descriptor itself.

Four functions take only the descriptor lock: addWatch, rmWatch, close, and isClosed. Four functions take the buffer lock, and may or may not take the descriptor lock: getEvent, peekEvent, getEventNonBlocking, and peekEventNonBlocking. Two functions only ever take the buffer lock: getEventFromBuffer and peekEventFromBuffer.

The buffer is implemented in an imperative fashion, and the lock protects all the buffer manipulations. The existence of the descriptor lock is slightly frustrating, as the only reason it’s there is to correctly support the use-after-close protection; the kernel would otherwise be able to deal with concurrent calls itself.

In this particular design, it’s important that neither of these locks are held for any significant length of time; in particular we don’t want to block while holding either one. This is so that the nonblocking functions are actually nonblocking: otherwise it would be difficult for getEventNonBlocking to not block while another call to getEvent is blocked on the descriptor.

So looking at the source of getEvent, the first thing it does is to take the buffer lock, then checks if the buffer is empty. If the buffer not empty, it returns the first message without ever taking the descriptor lock. If the buffer is empty, it then also takes the descriptor lock (via fillBuffer) and attempts to read from the descriptor. If the descriptor has been closed, then it throws an exception. If the read would block, then it drops both locks and then waits for the descriptor to become readable before it tries again. And of course if the read succeeds, then we grab the first message, then drop the locks, and return the message.

getEvent :: Inotify -> IO Event
getEvent inotify@Inotify{..} = loop
    funcName = "System.Linux.Inotify.getEvent"
    loop = join $ withLock bufferLock $ do
               start <- readIORef startRef
               end   <- readIORef endRef
               if start < end
               then (do
                        evt <- getMessage inotify start True
                        return (return evt)                  )
               else fillBuffer funcName inotify
                    -- file descriptor closed:
                       (throwIO $! fdClosed funcName)
                    -- reading the descriptor would block:
                       (\fd -> do
                            (waitRead,_) <- threadWaitReadSTM fd
                            return (atomically waitRead >> loop) )
                    -- read succeeded:
                            evt <- getMessage inotify 0 True
                            return (return evt)                  )

So there are two points of subtlety worth pointing out in this implementation. First is the idiom used in loop: the main body returns an IO (IO Event), which is turned into IO Event via join. The outer IO is done inside a lock, which computes an (IO Event) to perform after the lock(s) are dropped.

The second point of subtlety is the case where the read would block. You may want to think for a minute why I didn’t write this instead:

\fd -> do
   return (threadWaitRead fd >> loop)

Because we don’t want to block other operations that need the buffer and/or file descriptor locks, we need to drop those locks before we wait on the descriptor. But because we are dropping those locks, anything could happen before we actually start waiting on the descriptor.

In particular, another thread could close the descriptor, and the descriptor could be reused to refer to an entirely new description, all before we start waiting. Essentially, that file descriptor is truly valid only while we hold the descriptor lock; thus we need to go through the process of registering our interest in the file descriptor while we have the lock, and then actually blocking on the descriptor after we have released the lock.

Now, this race condition is a relatively benign one as long as the new description either becomes readable relatively quickly, or that the descriptor is closed and GHC’s IO manager is aware the fact. In the first case, the threadWaitRead fd will unblock, and the thread will loop and realize that the descriptor has been closed, and throw an exception. In the second case, GHC’s IO manager will cause threadWaitRead fd to throw an exception directly.

But if you get very unlucky, this might not happen for a very long time, if ever, causing deadlock. This might be the case if say, the new descriptor is a unix socket to /dev/log. Or, if the reallocated descriptor is managed by foreign code, and is closed without GHC’s IO manager being made aware, it is quite possible that this could deadlock.

Create a free website or blog at