This article follows how I developed an implementation of the Graham scan algorithm in Haskell, missteps and all. I think it’s valuable to see the process that others use, especially in a “weird” language like Haskell. At times, it can seem like most of what you’ve learned about developing software doesn’t apply to Haskell, but I think that’s just because the language allows so much of the scaffolding to be cut away once the software is finished. Reading Haskell code written by the masters can feel like looking at the Sistine chapel and wondering where they got such long paint brushes. This will be a step-by-step implementation as I developed it. I’m still learning too, so don’t take this as an example of excellent Haskell style.

The Graham scan algorithm finds the convex hull of a set of planar points in three steps. First, we find the lower-left point in the set (call it P). Second, we sort the set based on the angle each point makes with P and the x-axis. Third, we go through the sorted list and keep each point which makes a “left turn.”

### Data Types

Even before we start, there are some obvious types we need. We’ll need a type to store points. This could just be a tuple, but I think the code will be easier to understand if the point type has accessors. I’ll define `Point` as

data Point a = Point {px :: a, py :: a}
deriving Show

I parameterized the number type, so our algorithm can be as general as possible. I think we’ll want a type to represent the result of the turn calculation too, so define a turn data type:

data Turn = Left | Right | Colinear
deriving (Show, Eq)

Since I’m redefining `Left` and `Right` here, I have to hide `Either` with `import Prelude hiding (Either(..))`.

### Stubs

I want to start capturing what I know about the Graham scan function in code. First, it will take a set of points and return the points of the convex hull, so

grahamScan :: [Point a] -> [Point a]

Next, the algorithm doesn’t work right if there are fewer than three points, so it will generate an error in that case. There are many ways to handle errors in Haskell. Some common methods include putting the function in the Error monad, returning an Either or Maybe, or returning an empty list. I’ll explore other methods in future articles, but since this function is for experimentation I’m going to just raise an error. I’m not even going to bother with a detailed error report, so the function will start like this.

grahamScan points
| length points < 3 = error "Degenerate"
| otherwise         = …

Now that I know we have valid input, I can get to the meat of the algorithm. First we find the lower left point and sort the point list with it. I’m not sure how to find the lower left point, so I’ll just defer that to a function called `findFirstPoint`. I’ll use the functions `compare` and `on` with a new function `angle` to create a function that will compare points by their angle. This new comparator is handed to `sortBy` to sort the list. For now, I set `findFirstPoint` and `angle` to `undefined` so the code will compile. Compile early and often; it helps to keep mistakes from compounding and mutating.

let (firstPoint:rest) = findFirstPoint points
sortedRest = sortBy (compare `on` (angle firstPoint)) rest

The third step in the algorithm is a bit tricky. We need to look at the next three points in the set and decide whether or not to keep the second point, and the last two points have to be tested along with the first point. Since our input list is not circular, there has to be special handling for the end of the list.

loop (a:b:[]) = case turn a b firstPoint of
Left -> b : []
_    -> []
loop (a:b:c:ps) = case turn a b c of
Left -> b : loop (b:c:ps)
_    -> loop (a:c:ps)

Isn’t that lovely? Haskell is so expressive, the code basically reads like the English description: “Take the next three points in the list, and if they make a left turn keep the middle point. If there are only two points left, tack on the first point.” All that remains is to start the list with the first point and append the rest of the hull. Here is the complete body of the function.

grahamScan points
| length points < 3 = error "Degenerate"
| otherwise         =
let (firstPoint:rest) = findFirstPoint points
sortedRest = sortBy (compare `on` (angle firstPoint)) rest

loop (a:b:[]) = case turn a b firstPoint of
Left -> b : []
_    -> []
loop (a:b:c:ps) = case turn a b c of
Left -> b : loop (b:c:ps)
_    -> loop (a:c:ps)

in firstPoint : loop (firstPoint:sortedRest)

### Support

Now to define the helper functions I stubbed out before. The `angle` function is supposed to calculate the angle between ba and the x-axis, but it’s really calculating the cosine of that angle. Don’t tell anybody.

angle a b = dx / len where
dx = px a – px b
dy = py a – py b
len = sqrt (dx * dx + dy * dy)

Now the compiler says that the type parameter in our `Point` has to be in the `Floating` class for `sqrt`. Add that to the type declaration of `grahamScan`.

The function `turn` is also simple. If the cross product of the two line segments is greater than zero then it is a left turn, while less than zero is a right turn. Equal to zero is colinear, which can go either way depending on what you want from your convex hull. I report ‘em all and let the caller sort it out.

turn a b c = case compare cross 0 of
GT -> Left
EQ -> Colinear
LT -> Right
where
cross = x1 * y2 – x2 * y1
x1 = px b – px a
y1 = py b – py a
x2 = px c – px b
y2 = py c – py b

I used `compare` here so I could use `case` instead of nested if statements, or a function with guards. I don’t know what the performance differences are, I just think the code is cleaner like this.

Since this function compares the values of the point coordinates, we’ll need to add the `Ord` class to the `Point` parameter. That goes on `grahamScan`, which now reads

grahamScan :: (Floating a, Ord a) => [Point a] -> [Point a]

The last function, `firstPoint`, is a little tricky. You could define it using two passes, like this:

findFirstPoint points =
let firstPoint = minimumBy lowerLeftCmp points
rest = filter (/= firstPoint) points
in firstPoint:rest

However, I chose to do it in one pass by moving points from the input list to an output list. The lower-left point at each step is held back and prepended to the output at the end.

findFirstPoint points
| null points = error "Null points"
| otherwise   = loop points [] where
loop (a:[]) ps = a:ps
loop (a:b:rest) ps =
if (py a < py b) || (py a == py b && px a < px b)
then loop (a:rest) (b:ps)
else loop (b:rest) (a:ps)

And that’s it for a working implementation. I also added a convenience function for converting tuples to Points: `point = uncurry Point`.

### Refinement

Now we can test this out in `ghci`:

*GrahamScan> grahamScan . map point \$ [(2, 3), (1, 1), (-1, 0), (1, -3), (0, 0)]
[Point {px = 1.0, py = -3.0},Point {px = 2.0, py = 3.0},Point {px = -1.0, py = 0.0}]
*GrahamScan> grahamScan . map point \$ [(0, 0), (0.5, 0.5), (1, 1), (0, 2), (-1, 1)]
[Point {px = 0.0, py = 0.0},Point {px = 0.5, py = 0.5},Point {px = 0.0, py = 2.0},Point {px = -1.0, py = 1.0}]

Uh oh, there’s a bug. The convex hull in the last test should have included `(1, 1)` but not `(0.5, 0.5)`. The problem is that they are colinear with the lowest point `(0, 0)`, so they have the same angle. They will be sorted in an arbitrary order, but whichever one ends up first will be rejected. To fix this, we’ll need to add the distance to the lower-left point in the angle comparison. If two points have the same angle, the point that is closer should come first in the list. That way, it will be rejected.

To do this, I’ll change `angle` to return a tuple with the angle and the length. When comparing tuples, the elements `n` are only compared if the elements `n-1` are equal, which is exactly what we want. So, with the subdefinitions the same, angle becomes:

angle a b = (dx / len, len)

When I did this, I realized that I could use a tuple in the `findFirstPoint` comparison too. The expression:

(py a < py b) || (py a == py b && px a < px b)

is equivalent to

(py a, px a) < (py b, px b)

I think the second is easier to read, and due to laziness the x-coordinates aren’t even calculated unless the y-coordinates are equal.

### Final Thoughts

So that’s it really. You can get the final source here. I would be interested to see someone tune the performance on this. It scales about linearly,1 which is expected since the algorithm is O(n log n), but I think that there’s plenty of room for improvement. If you have some suggestions, fork the repository at Github and show me what you can do.

Haskell’s expressive power made the implementation of this algorithm fairly easy. There are several different ways to do it, and I’ll explore some of the options in future articles.

1. These are timings for the calculation of the convex hull of sets of random vertices. The timings were performed on a 2.1 GHz Intel Core 2 Duo.

Input Vertices Time (s)
100 0.013
200 0.021
500 0.037
10000 0.70
20000 1.4
50000 3.5
100000 7.0
200000 14.3
500000 37