Repa is a new library in Haskell for handling arrays. It has flexible indexing like the old array library, but supports parallel computation, stream fusion, and has a rich API like the vector library.

To learn how to use repa, I implemented Conway’s Life using repa for the simulation and OpenGL for the display. The complete code is available on BitBucket and builds on my GLFW-b boilterplate, so I’ll only discuss the interesting parts here. Don Stewart wrote a good introductory tutorial to repa that will fill in any gaps I leave in this post.

## Evolution

We’ll start with the Life evolution step, because that’s actually the simplest function in this implementation. I decided to represent the Life grid as a repa array of `Word8` elements. `Boolean` elements might seem like a more natural choice, and that’s what I used in my first version, but you’ll see that using bytes makes the evolution step simpler.1 The evolution step uses repa’s stencil convolution support, which can turn small, static convolution kernels into fast code.

type Grid = A.Array A.U A.DIM2 A.Word8

updateGrid :: Grid -> IO Grid
updateGrid = A.computeP
. A.smap step
. mapStencil2 (A.BoundConst 0x20)
[stencil2| 1  1  1
1 16  1
1  1  1 |]

where
{-# INLINE step #-}
step x
| x == (16 .|. 2) = 1
| (x .&. 0xF) == 3 = 1
| otherwise = 0

Here is how the evolution step works. In the Life grid, a dead cell is 0 and a live cell is 1, so the convolution counts the adjacent live cells and adds 16 if the current cell was alive in the previous generation. Then we map over the results of the convolution and apply Conway’s rules in the step function: 3 live neighbors brings the cell to life, 2 live neighbors makes no change, any other number kills the cell.

The convolution and map generate a “delayed” array. A delayed array is just a function that maps from an index to a value. This is good if we’re only accessing a few elements, but if the elements are accessed several times they have to be recalculated. Instead, we can calculate the whole thing to generate a “manifest” array with either the `computeS` or `computeP` functions. The suffix describes the calculation strategy: `S` means serial and `P` means parallel. You might notice that `computeP` is monadic. Repa doesn’t support nested parallel computations, so to make sure each parallel calculation is complete before the next one starts, the parallel versions of repa’s calculation functions are monadic.

The stencil code is a little strange, so let’s take a closer look at it.

mapStencil2 (A.BoundConst 0x20)
[stencil2| 1  1  1
1 16  1
1  1  1 |]

The last construction there between the brackets is actually template Haskell that is evaluated at compile time to generate Haskell code. This requires the language pragma `QuasiQuotes` (if you don’t understand that, look at the top of the source file), and we have to make sure that the names in the package `Data.Array.Repa.Stencil.Dim2` are visible. That means that package can’t be imported qualified. The repa index constructor also has to be visible; since we import `Data.Array.Repa` qualified we need this import line

import Data.Array.Repa ((:.) (..))

It’s strange because it looks like an operator but it’s actually a data constructor.

The other argument to the `mapStencil2` function is how to treat boundary values. There are three options here.

BoundFixed a
This option will make the entire border area a constant value. The border area is half the size of the convolution kernel.
BoundConst a
This option uses the constant value for all locations outside the array.
BoundClamp
This basically repeats the edge values as far as necessary.

I used `BoundConst` here to use a special value for locations outside the grid. This way, cells will always die when they get to the edge. It would be better to expand the grid one cell in each direction outside the view so that it appears infinite. That will be easier to do after I switch to shaders for the rendering, so I’ll address that in a later post.

## Grid Initialization

Here is the function that creates a new grid when the program starts.

freshGrid :: Int -> Grid
freshGrid gridSize =
A.computeS . A.fromFunction (A.Z :. gridSize :. gridSize) \$ initCell

where
cx = gridSize `div` 2

-- | Initialize a grid cell. Guards are used
--   to make the initial pattern.
initCell ix

| ix == (A.Z :. cx :. cx) = 1
| ix == (A.Z :. cx-1 :. cx) = 1
| ix == (A.Z :. cx+1 :. cx) = 1

First it creates a delayed array with `A.fromFunction`, then calculates it into a manifest array with `A.computeS`. The function is a mapping from array index to value. Repa indices are created with the special constructors `Z` and `:.`. I’m using `Int`s for the index types, but repa supports indices of any type.

The initialization function is implemented using guards to poke some live cells into an initial pattern. This one is a blinker, and there is code for an acorn in the full source. A much better solution would be to accept a file containing the initial pattern. Supporting one or all of the formats on the Conway Life Wiki would open a vast catalog to us. Another option would be allowing the user to draw a pattern in the screen before starting the simulation. I’ll cover both of these in later posts. [Update: There is now a series of posts about the life pattern file parsers]

## Rendering

The view is pretty straightforward. In each frame, I write the grid to a grayscale texture and draw it on a quad. Unboxed repa arrays use unboxed vectors underneath, so the grid is easily converted to a storable vector which can be dumped to texture memory. The only trick here is padding the array to a power of two width. Modern graphics cards are supposed to handle textures with off dimensions, but I have a cheap mobile Intel GPU that freaks out when I give it an off-size texture. Look at the source for all the details, but this is the important bit that uses traverse to copy the array into a bigger array.

draw grid = do
{- ... -}
where
gridShape@(A.Z :. width :. height) = A.extent grid
maxS = nextPowerOf2 width
maxT = nextPowerOf2 height

(const (A.Z :. (fromIntegral maxS) :. (fromIntegral maxT)))
\$ \lkup ix ->
if A.inShape gridShape ix
then lkup ix
else 0

## Performance

At first I thought this would have pretty terrible performance. Each frame is allocating several new arrays for the evolution and the conversion steps to the texture buffer. I thought I would need to rewrite this using mutable vectors to avoid allocations. However, when I ran it with the stats on (`+RTS -s`) the garbage collector hardly ever runs (only once in several thousand generations) and the program runs in constant space. Thanks to repa, the code is already parallelized and takes advantage of multiple cores (run it with `+RTS -NX` where X is the number of cores to use).

In the future I want to convert the rendering to use shaders. This will simplify the texturing step since the grayscale conversion can be done in the fragment shader. It will also fix the boundary issue, since I’ll be able to expand the grid to a power of two and only render a section.

Take a look at the full source, fork it and improve it! What would you like to see next in my Haskell experiments? Let me know in the comments or on twitter.

1. The current implementation of repa doesn’t bit pack a bool array, so there is no real advantage to using `Boolean` elements anyway.