Functional Pearls #2: The Zipper

Well I’m back with part two of my functional pearls series. This pearl is perhaps the most famous: Gerard Huet’s Zipper. Its a simple, but beautiful little data structure for purely functional tree editing.

the problem

You have to create an application which involves manipulating trees of some kind (Huet wanted to make a structured editor for a proof assistant – whatever that means!). You want to be able to move about in a tree, change things etc. You want it to be efficient, but you don’t want to do evil mutable things.

With that background, let’s start with a variable arity tree type:

data Tree a = Leaf a | Tree (a, [Tree a])

Now we’re going to construct a Zipper for such trees. We want tree editing of course, but I am going to focus on navigation within the tree. If you can do that, then editing is easy.

One way of “moving about” within such a tree is obvious: I start at the top and go down to whatever spot I’m interested in. This is perfectly fine if my moves are going to be entirely random because in that case no matter what I do I will have to work arbitrarily hard to get where I want to go, so this solution is as good as any. However, tree editing is rarely that random.

Usually after I work on a node, I am likely to move to one of the nodes nearby. So it would be great if I had an efficient way of moving up/down/left/right from the current node (instead of always starting at the root). This means I must have a way to remember where I am right now and how I got there. In a nutshell, thats all the zipper is: a way to keep track of location within a tree in order to provide efficient movement and editing operations around the ‘current’ location.

Lost!

So we will perform operations not on a tree, but on a ‘location’

data Location a = (Tree a, Path a)

Location is a tuple of two things: a (sub)tree which is the current focus of attention, and a path upwards from the root of this subtree to the root of the entire tree. This path must have all the information needed to build up the rest of the tree whenever needed. For example, if I want to move up, I need to be able to build my parent’s subtree again. So we define:

data Path a = Top | Node (a, [Tree a], Path a, [Tree a])

Basically, this says: If I am (the root of) a subtree, I need to know four things – My siblings on the left and on the right, my parent’s label (in order to build the parent again), and a path to my parent. This is what a ‘Node’ stores (Top simply means there is nothing above me). Then working recursively upward, I can build up the entire tree.

This is where the Zipper gets its name. It works a lot like a physical zipper (except inverted). When moving down to a subtree, you are “zipping up” the rest of the tree into a ‘Path’. When you move up, you “pull down” the zipper to reveal… the rest of the tree! ;)

OK, now lets look at a navigation operation – moving one step left/right:

goLeft (t, Node (a, tl:l, p, r)) = (tl, Node(a, l, p, t:r))
goRight (t, Node (a, l, p, tr:r)) = (tr, Node(a, t:l, p, r))

(Of course all other patterns are errors.) Both operations are simple, the only thing to note is the ‘direction’ of the left and right sibling lists ‘l’ & ‘r’. The immediate left (resp. right) sibling is at the head and the lists then move ‘outward’ from there. This ensures constant time movement in both directions.

Moving down is also constant time, moving up however is not:

goDown (Tree(a, t:r), p) = (t, Node(a, [], p, r))
goUp (t, Node(a, l, p, r)) = (Tree (a, reverse l ++ (t:r)), p)

With these basic navigation primitives, we can define more complicated movements. Insertions and deletions are also similar (code given later). There can also be a few variations on this zipper: Huet gave an example of a ‘memoizing’ zipper, which is useful when you want to go up and down to the same spot often. Here, the tree would be defined as:

data MemoTree a = Leaf a | Tree (a, [MemoTree a], MemoTree a, [MemoTree a])

A MemoTree keeps track of a certain distinguished subtree at each level. The definitions of ‘Path’ and the ‘goUp’ and ‘goDown’ primitives are changed slightly, so that goUp . goDown = id i.e. goDown doesn’t take you to the leftmost child, but rather to the exact same child you went up from (hence the MemoTree a in the Tree constructor).

Genericize?

Now this is all well and good, but there is something fundamentally unsatisfactory here: we are forced to use variable arity trees. Suppose I had a binary tree instead.

 
BTree a = Nil | BTree (a, BTree a, BTree a) 

How would I obtain a Zipper for this? Of course, it is easy to modify the code for a variable arity Zipper and make it work with BTrees. But this is not good enough because I would have to (manually) create a new Zipper for every tree type. Can I generate these Zippers automatically? Alternatively, can I write a single Zipper that will work for any tree type?

We have now entered Conor McBride territory. If you want to generate a Zipper, then given a tree you must define a Path type. McBride found a wonderful connection between ordinary, high school derivatives, and Zipper types. I will talk about this in my next post (or you could read McBride’s “The derivative of a regular type is it’s type of one hole context”), but its too long (and too interesting) to fit here.

Right now I’m going to take a different (and very simplistic) route. Here you will find a Zipper module (largely untested btw) which is a very simple hack based on the variable arity Zipper. It exports a single class, Zippable, that can be instantiated by a fixed arity tree to automagically get a Zipper for that tree.

class Zippable tree a | tree -> a 

(You might want to look up multi-parameter type classes and functional dependencies, but you don’t really need to. the tree -> a part simply means that when instantiating this class, the ‘tree’ type will uniquely specify an ‘a’ type. It’s only needed to remove ambiguities when type checking).

Here tree is your tree type (eg BTree String) and a is the element type (eg String). To create a Zippable instance, you must define the following functions (the navigation/editing defaults can then be used):

fromList :: (a, [tree]) -> tree
toList :: tree -> (a, [tree])
isNil :: tree -> bool
nil :: tree
arity :: tree -> Int

And here’s a BTree Zipper:

instance (BTree a) a where
  fromList (a, [l,r]) = BTree (a, l, r)
  toList (BTree (a, l, r)) = (a, [l,r])

  isNil Nil = True
  isNil _ = False
  nil = Nil
  arity _ = 2

fromList is used to make it look like the current node is variable arity. toList does the opposite. Internally, all tree editing functions maintain a constant number of trees at each level (by inserting/deleting ‘nils’ which act as placeholders: which is why the nil function is needed to return a ‘nil’ value). The arity represents arity (doh!), and currently its not really used except that setting it to 0 is used to specify variable arity.

Now one obvious problem here is that this really only works for types like ‘BTree’, i.e. your tree must be representable in this particular form. This need not be the case for even slightly more exotic trees (for eg. where there may be several branches of different types). So lets restrict ourselves to these simple tree types for now.

The module hides the definition of Path and Location, but provides two functions tree and root which take a tree in and out of a Zipper. So externally, your program only sees your tree type. Obviously, the whole thing is just an ugly hack, but to an extent it works!
For example:

(>>>) = flip (.)
edittree :: (Zippable (BTree a) a) => a -> BTree a -> BTree a

edittree a = (tree >>> goDown >>> changeLabel a >>> goUp >>> root)

There may also be scope for some fun variations, because you may be able to play around with fromList. Perhaps perform height balancing – fromList only gets called when moving up, or inserting below the current position. Another interesting question is whether you could define suitable fromList and toList‘s for non-tree structures. Perhaps a Zipper for a graph which works on say, its (implicit) DFS tree? Of course I’m not at all sure this is possible, or that it makes any sense at all ;)

Anyway, that’s it for now, next time I’ll talk about what derivatives have to do with data types, and a more serious solution to the generic zipper problem.

About these ads

~ by hellfeuer on September 9, 2008.

4 Responses to “Functional Pearls #2: The Zipper”

  1. Hi there, nice article. There is something wrong there though:

    goDown (Tree(a, t:l, r), p) = (t, Node(a, l, p, r))

    The Tree contsructor only has two parameters, I guess this should be:

    goDown (Tree(a, t:r), p) = (t, Node(a, [], p, r))

  2. Of course, thanks!

  3. [...] 2.5 – Derivatives of Types This post is not only extremely late, its also a bit of a cop-out. Last time I said I would write about derivatives of types. This is a cop-out in the sense that its a bit more [...]

  4. [...] For chain matrix multiplication, the order of accesses is something like this: The value of a cell in the memo table depends on the minimum of the sums of the corresponding elements in the column below and the row to its left. So you focus on a cell and the corresponding row and column. You can calculate the value in the cell by zipping the corresponding row and column together with (+), and taking the minimum. Then you move to the next cell. Now your focus must change to the appropriate row and column (efficiently!), using what is essentially a type of zipper [...]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
Follow

Get every new post delivered to your Inbox.

%d bloggers like this: