## 2010 April 4

### On powersets and folds

Filed under: algorithms, corecursion, haskell — lpsmith @ 10:03 pm

A while ago, somebody on reddit brought up a particularly cute and fairly well known definition of powerset:

```powerset :: [a] -> [[a]]
powerset = filterM (const [False,True])
```

Unfortunately, it is also fairly well known that this is a very space-inefficient way of producing the powerset, for example evaluating `length (powerset [1..32])` will allocate gigabytes of memory within seconds on modern computers, and is an effective denial-of-service attack against most default configurations.

Fortunately, there are solutions that avoid this space leak while maintaining the same style of definition. One solution is to use a revised definition of Control.Monad.filterM. Here is the standard definition:

```filterM          :: (Monad m) => (a -> m Bool) -> [a] -> m [a]
filterM _ []     =  return []
filterM p (x:xs) =  do
flg <- p x
ys  <- filterM p xs
return (if flg then x:ys else ys)
```

Here is the revised definition:

```filterM'          :: (Monad m) => (a -> m Bool) -> [a] -> m [a]
filterM' _ []     =  return []
filterM' p (x:xs) =  do
ys  <- filterM' p xs
flg <- p x
return (if flg then x:ys else ys)
```

The only difference is that we changed the order of two lines. This in turn changes the order in which the subsets are produced:

```powerset  "abc" == ["","c","b","bc","a","ac","ab","abc"]
powerset' "abc" == ["","a","b","ab","c","ac","bc","abc"]
```

These orderings are analogous to counting in binary, with the original definition having the least significant bit at the end of the list, whereas the revised versions have the least signficant bit at the beginning:

```pow      pow'

000 *    000 *
001      100 *
010      010
011      110
100 *    001
101      101
110      011
111      111
```

Both solutions have maximal sharing among the tails of the sublists, but the second solution arranges the sharing so that shared sublists appear adjacent to each other in the overall order, as illustrated by the asterisks. This allows `length (powerset' [1..32])` to be evaluated in a very modest amount of space.

Note that this space leak is inherent to any solution that produces the first ordering, and has maximal sharing! If the first ordering is needed, it is better to recompute the sublists than to share them. As Cale Gibbard pointed out, one way to eliminate the sharing is to use another monad, such as the Logic monad.

```import Control.Applicative ((<|>))

powerset2 :: [a] -> [[a]]
powerset2 = observeAll . filterM (const (return False <|> return True))
```

## Making powerset lazier

The definition of `powerset` can be generalized to produce all the finite sublists of a potentially infinite list. However, it is not obvious how to express this via tweaking `filterM` without breaking the abstraction, so let’s start with another efficient formulation based on concatMap.

```powerset' [] = [[]]
powerset' (x:xs) = concatMap (\ys -> [ys, x:ys]) (powerset' xs)
```

On an infinite list, this definition gets stuck in an non-productive loop because `powerset'` must traverse the entire list before it returns anything. Note that on finite cases, the first sublist returned is always `[]`. Formally, we can state this as `powerset' xs == [] : tail (powerset' xs)`. It is sensible to extend this equality to the infinite case, due to the analogy to counting in binary. By making this substitution, we produce a lazier definition that can handle infinite lists, from which we can calculate a definition that’s more aesthetically pleasing and slightly more efficient:

```powerset' []     = [[]]
powerset' (x:xs) = concatMap (\ys -> [ys, x:ys]) ([] : tail (powerset' xs))
-- apply definition of concatMap
= [] : [x] : concatMap (\ys -> [ys, x:ys]) (tail (powerset' xs))
```

We can clean this definition up by calculating definitions for `tail . powerset'`. We start by applying `tail` to both sides of the two cases:

```tail (powerset' [])     = tail [[]]
= []
tail (powerset' (x:xs)) = tail ([] : [x] : concatMap (\ys -> [ys, x:ys]) (tail (powerset' xs)))
= [x] : concatMap (\ys -> [ys, x:ys]) (tail (powerset' xs))
```

Next, we can choose a new name for `tail . powerset'`, such as `nonempties`, and substitute it through the definition. Finally, we can recover something equivalent to the original definition by appending the empty list to `nonempties`

```lazyPowerset xs = [] : nonempties xs
where nonempties [] = []
nonempties (x:xs) = [x] : concatMap (\ys -> [ys, x:ys]) (nonempties xs)
```

This is equivalent to Data.List.subsequences as added in GHC 6.10.1. I thank Bertram Felgenhauer for (unknowingly) providing some comments that were rather helpful in creating this presentation.

## Generalizing powerset to an enriched fold

By parameterizing the instances of `(:)` and `[]` that create the inner sublists, we can turn `lazyPowerset` into a kind of fold:

```powerfoldr :: (a -> b -> b) -> b -> [a] -> [b]
powerfoldr f b as = b : nonempties as
where nonempties [] = []
nonempties (a:as) = f a b : concatMap (\b' -> [b', f a b']) (nonempties as)
```

For example, we can calculate the sum of every finite combination of the powers of 2:

```powerfoldr (+) 0 (iterate (2*) 1)  == [0,1,2,3,4..]
```

An equivalent “higher-level” definition of `powerfoldr` in terms of `map`, `foldr`, and `lazyPowerset` is as follows:

```powerfoldr' f b = map (foldr f b) . lazyPowerset
```

This definition, however, recomputes the right fold over common sublists; the sharing provided by `lazyPowerset` is not preserved by this latter definition. The first definition has the same level of sharing as the original, reducing the number of applications of `f` by half. I found the first definition of `powerfoldr` to be quite helpful in solving Project Euler Problem 268; it certainly was not optimal for the purpose, but it was more than adequate.