A while ago, somebody on reddit brought up a particularly cute and fairly well known definition of powerset:
powerset :: [a] -> [[a]] powerset = filterM (const [False,True])
Unfortunately, it is also fairly well known that this is a very space-inefficient way of producing the powerset, for example evaluating length (powerset [1..32])
will allocate gigabytes of memory within seconds on modern computers, and is an effective denial-of-service attack against most default configurations.
Fortunately, there are solutions that avoid this space leak while maintaining the same style of definition. One solution is to use a revised definition of Control.Monad.filterM. Here is the standard definition:
filterM :: (Monad m) => (a -> m Bool) -> [a] -> m [a] filterM _ [] = return [] filterM p (x:xs) = do flg <- p x ys <- filterM p xs return (if flg then x:ys else ys)
Here is the revised definition:
filterM' :: (Monad m) => (a -> m Bool) -> [a] -> m [a] filterM' _ [] = return [] filterM' p (x:xs) = do ys <- filterM' p xs flg <- p x return (if flg then x:ys else ys)
The only difference is that we changed the order of two lines. This in turn changes the order in which the subsets are produced:
powerset "abc" == ["","c","b","bc","a","ac","ab","abc"] powerset' "abc" == ["","a","b","ab","c","ac","bc","abc"]
These orderings are analogous to counting in binary, with the original definition having the least significant bit at the end of the list, whereas the revised versions have the least signficant bit at the beginning:
pow pow' 000 * 000 * 001 100 * 010 010 011 110 100 * 001 101 101 110 011 111 111
Both solutions have maximal sharing among the tails of the sublists, but the second solution arranges the sharing so that shared sublists appear adjacent to each other in the overall order, as illustrated by the asterisks. This allows length (powerset' [1..32])
to be evaluated in a very modest amount of space.
Note that this space leak is inherent to any solution that produces the first ordering, and has maximal sharing! If the first ordering is needed, it is better to recompute the sublists than to share them. As Cale Gibbard pointed out, one way to eliminate the sharing is to use another monad, such as the Logic monad.
import Control.Applicative ((<|>)) import Control.Monad.Logic (observeAll) import Control.Monad (filterM) powerset2 :: [a] -> [[a]] powerset2 = observeAll . filterM (const (return False <|> return True))
Making powerset lazier
The definition of powerset
can be generalized to produce all the finite sublists of a potentially infinite list. However, it is not obvious how to express this via tweaking filterM
without breaking the abstraction, so let’s start with another efficient formulation based on concatMap.
powerset' [] = [[]] powerset' (x:xs) = concatMap (\ys -> [ys, x:ys]) (powerset' xs)
On an infinite list, this definition gets stuck in an non-productive loop because powerset'
must traverse the entire list before it returns anything. Note that on finite cases, the first sublist returned is always []
. Formally, we can state this as powerset' xs == [] : tail (powerset' xs)
. It is sensible to extend this equality to the infinite case, due to the analogy to counting in binary. By making this substitution, we produce a lazier definition that can handle infinite lists, from which we can calculate a definition that’s more aesthetically pleasing and slightly more efficient:
powerset' [] = [[]] powerset' (x:xs) = concatMap (\ys -> [ys, x:ys]) ([] : tail (powerset' xs)) -- apply definition of concatMap = [] : [x] : concatMap (\ys -> [ys, x:ys]) (tail (powerset' xs))
We can clean this definition up by calculating definitions for tail . powerset'
. We start by applying tail
to both sides of the two cases:
tail (powerset' []) = tail [[]] = [] tail (powerset' (x:xs)) = tail ([] : [x] : concatMap (\ys -> [ys, x:ys]) (tail (powerset' xs))) = [x] : concatMap (\ys -> [ys, x:ys]) (tail (powerset' xs))
Next, we can choose a new name for tail . powerset'
, such as nonempties
, and substitute it through the definition. Finally, we can recover something equivalent to the original definition by appending the empty list to nonempties
lazyPowerset xs = [] : nonempties xs where nonempties [] = [] nonempties (x:xs) = [x] : concatMap (\ys -> [ys, x:ys]) (nonempties xs)
This is equivalent to Data.List.subsequences as added in GHC 6.10.1. I thank Bertram Felgenhauer for (unknowingly) providing some comments that were rather helpful in creating this presentation.
Generalizing powerset to an enriched fold
By parameterizing the instances of (:)
and []
that create the inner sublists, we can turn lazyPowerset
into a kind of fold:
powerfoldr :: (a -> b -> b) -> b -> [a] -> [b] powerfoldr f b as = b : nonempties as where nonempties [] = [] nonempties (a:as) = f a b : concatMap (\b' -> [b', f a b']) (nonempties as)
For example, we can calculate the sum of every finite combination of the powers of 2:
powerfoldr (+) 0 (iterate (2*) 1) == [0,1,2,3,4..]
An equivalent “higher-level” definition of powerfoldr
in terms of map
, foldr
, and lazyPowerset
is as follows:
powerfoldr' f b = map (foldr f b) . lazyPowerset
This definition, however, recomputes the right fold over common sublists; the sharing provided by lazyPowerset
is not preserved by this latter definition. The first definition has the same level of sharing as the original, reducing the number of applications of f
by half. I found the first definition of powerfoldr
to be quite helpful in solving Project Euler Problem 268; it certainly was not optimal for the purpose, but it was more than adequate.