Well I’m back with part two of my functional pearls series. This pearl is perhaps the most famous: Gerard Huet’s Zipper. Its a simple, but beautiful little data structure for purely functional tree editing.
You have to create an application which involves manipulating trees of some kind (Huet wanted to make a structured editor for a proof assistant – whatever that means!). You want to be able to move about in a tree, change things etc. You want it to be efficient, but you don’t want to do evil mutable things.
With that background, let’s start with a variable arity tree type:
data Tree a = Leaf a | Tree (a, [Tree a])
Now we’re going to construct a Zipper for such trees. We want tree editing of course, but I am going to focus on navigation within the tree. If you can do that, then editing is easy.
One way of “moving about” within such a tree is obvious: I start at the top and go down to whatever spot I’m interested in. This is perfectly fine if my moves are going to be entirely random because in that case no matter what I do I will have to work arbitrarily hard to get where I want to go, so this solution is as good as any. However, tree editing is rarely that random.
Usually after I work on a node, I am likely to move to one of the nodes nearby. So it would be great if I had an efficient way of moving up/down/left/right from the current node (instead of always starting at the root). This means I must have a way to remember where I am right now and how I got there. In a nutshell, thats all the zipper is: a way to keep track of location within a tree in order to provide efficient movement and editing operations around the ‘current’ location.
So we will perform operations not on a tree, but on a ‘location’
data Location a = (Tree a, Path a)
Location is a tuple of two things: a (sub)tree which is the current focus of attention, and a path upwards from the root of this subtree to the root of the entire tree. This path must have all the information needed to build up the rest of the tree whenever needed. For example, if I want to move up, I need to be able to build my parent’s subtree again. So we define:
data Path a = Top | Node (a, [Tree a], Path a, [Tree a])
Basically, this says: If I am (the root of) a subtree, I need to know four things – My siblings on the left and on the right, my parent’s label (in order to build the parent again), and a path to my parent. This is what a ‘Node’ stores (Top simply means there is nothing above me). Then working recursively upward, I can build up the entire tree.
This is where the Zipper gets its name. It works a lot like a physical zipper (except inverted). When moving down to a subtree, you are “zipping up” the rest of the tree into a ‘Path’. When you move up, you “pull down” the zipper to reveal… the rest of the tree! ;)
OK, now lets look at a navigation operation – moving one step left/right:
goLeft (t, Node (a, tl:l, p, r)) = (tl, Node(a, l, p, t:r))
goRight (t, Node (a, l, p, tr:r)) = (tr, Node(a, t:l, p, r))
(Of course all other patterns are errors.) Both operations are simple, the only thing to note is the ‘direction’ of the left and right sibling lists ‘l’ & ‘r’. The immediate left (resp. right) sibling is at the head and the lists then move ‘outward’ from there. This ensures constant time movement in both directions.
Moving down is also constant time, moving up however is not:
goDown (Tree(a, t:r), p) = (t, Node(a, , p, r))
goUp (t, Node(a, l, p, r)) = (Tree (a, reverse l ++ (t:r)), p)
With these basic navigation primitives, we can define more complicated movements. Insertions and deletions are also similar (code given later). There can also be a few variations on this zipper: Huet gave an example of a ‘memoizing’ zipper, which is useful when you want to go up and down to the same spot often. Here, the tree would be defined as:
data MemoTree a = Leaf a | Tree (a, [MemoTree a], MemoTree a, [MemoTree a])
A MemoTree keeps track of a certain distinguished subtree at each level. The definitions of ‘Path’ and the ‘goUp’ and ‘goDown’ primitives are changed slightly, so that
goUp . goDown = id i.e. goDown doesn’t take you to the leftmost child, but rather to the exact same child you went up from (hence the MemoTree a in the Tree constructor).
Now this is all well and good, but there is something fundamentally unsatisfactory here: we are forced to use variable arity trees. Suppose I had a binary tree instead.
BTree a = Nil | BTree (a, BTree a, BTree a)
How would I obtain a Zipper for this? Of course, it is easy to modify the code for a variable arity Zipper and make it work with BTrees. But this is not good enough because I would have to (manually) create a new Zipper for every tree type. Can I generate these Zippers automatically? Alternatively, can I write a single Zipper that will work for any tree type?
We have now entered Conor McBride territory. If you want to generate a Zipper, then given a tree you must define a Path type. McBride found a wonderful connection between ordinary, high school derivatives, and Zipper types. I will talk about this in my next post (or you could read McBride’s “The derivative of a regular type is it’s type of one hole context”), but its too long (and too interesting) to fit here.
Right now I’m going to take a different (and very simplistic) route. Here you will find a Zipper module (largely untested btw) which is a very simple hack based on the variable arity Zipper. It exports a single class, Zippable, that can be instantiated by a fixed arity tree to automagically get a Zipper for that tree.
class Zippable tree a | tree -> a
(You might want to look up multi-parameter type classes and functional dependencies, but you don’t really need to. the
tree -> a part simply means that when instantiating this class, the ‘tree’ type will uniquely specify an ‘a’ type. It’s only needed to remove ambiguities when type checking).
tree is your tree type (eg
BTree String) and a is the element type (eg
String). To create a Zippable instance, you must define the following functions (the navigation/editing defaults can then be used):
fromList :: (a, [tree]) -> tree
toList :: tree -> (a, [tree])
isNil :: tree -> bool
nil :: tree
arity :: tree -> Int
And here’s a BTree Zipper:
instance (BTree a) a where
fromList (a, [l,r]) = BTree (a, l, r)
toList (BTree (a, l, r)) = (a, [l,r])
isNil Nil = True
isNil _ = False
nil = Nil
arity _ = 2
fromList is used to make it look like the current node is variable arity.
toList does the opposite. Internally, all tree editing functions maintain a constant number of trees at each level (by inserting/deleting ‘nils’ which act as placeholders: which is why the nil function is needed to return a ‘nil’ value). The
arity represents arity (doh!), and currently its not really used except that setting it to 0 is used to specify variable arity.
Now one obvious problem here is that this really only works for types like ‘BTree’, i.e. your tree must be representable in this particular form. This need not be the case for even slightly more exotic trees (for eg. where there may be several branches of different types). So lets restrict ourselves to these simple tree types for now.
The module hides the definition of Path and Location, but provides two functions
root which take a tree in and out of a Zipper. So externally, your program only sees your tree type. Obviously, the whole thing is just an ugly hack, but to an extent it works!
(>>>) = flip (.)
edittree :: (Zippable (BTree a) a) => a -> BTree a -> BTree a
edittree a = (tree >>> goDown >>> changeLabel a >>> goUp >>> root)
There may also be scope for some fun variations, because you may be able to play around with
fromList. Perhaps perform height balancing – fromList only gets called when moving up, or inserting below the current position. Another interesting question is whether you could define suitable
toList‘s for non-tree structures. Perhaps a Zipper for a graph which works on say, its (implicit) DFS tree? Of course I’m not at all sure this is possible, or that it makes any sense at all ;)
Anyway, that’s it for now, next time I’ll talk about what derivatives have to do with data types, and a more serious solution to the generic zipper problem.