What I Wish I Knew When Learning Haskell 2.0

Stephen Diehl (@smdiehl )

Since I wrote these slides for a little user group talk I gave two years ago they have become a surprisingly popular reference. I decided to actually turn them into a proper skimmable reference for intermediate level Haskell topics that don't necessarily have great coverage or that tend be somewhat opaque as to where to get going, and then aggregate a bunch of the best external resources for diving into those subjects with more depth. Hopefully it still captures the "no bullshit brain dump" style that seemed to be liked.

The source for all snippets is available here. If there are any errors or you think of a more illustrative example feel free to submit a pull request.

Cabal

To start a new Haskell project run

$ cabal init
$ cabal configure

A .cabal file will be created.

Sandboxes ( in cabal > 1.8 ) are self contained environments of Haskell packages.

$ cabal sandbox init

To update the package index from Hackage.

$ cabal update

To install the dependencies for the package:

$ cabal install --only-dependencies

To run the "executable" for a library under the cabal sandbox:

$ cabal run
$ cabal run <name>

To load the "library" into a GHCi shell under the cabal sandbox:

$ cabal repl
$ cabal repl <name>

An example cabal file:

name:               mylibrary
version:            0.1
cabal-version:      >= 1.10
author:             Paul Atreides 
license:            MIT
license-file:       LICENSE
synopsis:           The code must flow.
category:           Math
tested-with:        GHC
build-type:         Simple
  
library                    
    exposed-modules:
      Library.ExampleModule1
      Library.ExampleModule2

    build-depends: 
      base >= 4 && < 5

    default-language: Haskell2010

    ghc-options: -O2 -Wall -fwarn-tabs

executable "example"
    build-depends: 
        base >= 4 && < 5
    default-language: Haskell2010
    main-is: Main.hs    

Using the cabal repl and cabal run commands are preferable but sometimes we'd like to perform their equivalents at the shell, there are several useful aliases that rely on shell directory expansion to find the package database in the current working directory and launch GHC with the appropriate flags:

alias ghc-sandbox="ghc -no-user-package-db -package-db .cabal-sandbox/*-packages.conf.d"
alias ghci-sandbox="ghci -no-user-package-db -package-db .cabal-sandbox/*-packages.conf.d"
alias runhaskell-sandbox="runhaskell -no-user-package-db -package-db .cabal-sandbox/*-packages.conf.d"

Courtesy of Brian McKenna there is also a zsh script to show the sandbox status of the current directory in our shell.

function cabal_sandbox_info() {
    cabal_files=(*.cabal(N))
    if [ $#cabal_files -gt 0 ]; then
        if [ -f cabal.sandbox.config ]; then
            echo "%{$fg[green]%}sandboxed%{$reset_color%}"
        else
            echo "%{$fg[red]%}not sandboxed%{$reset_color%}"
        fi
    fi
}
 
RPROMPT="\$(cabal_sandbox_info) $RPROMPT"

GHCi

Command Shortcut Action
:reload :r Code reload
:type :t Type inspection
:kind :k Kind inspection
:info :i Instance inspection
:print :p Print the expression
:edit :e Load file in system editor.
λ: :type 3
3 :: Num a => a
λ: :kind Either
Either :: * -> * -> *
λ: :info Functor
class Functor f where
  fmap :: (a -> b) -> f a -> f b
  (<$) :: a -> f b -> f a
    -- Defined in `GHC.Base'
  ...

The current state of the environment can also be queried.

λ: :browse
λ: :show bindings

A local copy of Hoogle can be installed and used from within GHCi.

cabal install hoogle
cabal install hlint

We can use it by adding a command to our ~/.ghc/ghci.conf.

-- You must set your prompt to a lambda, it's a law.
:set prompt "λ: "

:def hlint const . return $ ":! hlint \"src\""
:def hoogle \s -> return $ ":! hoogle --count=15 \"" ++ s ++ "\""

For editor integration with vim and emacs:

cabal install hdevtools
cabal install ghc-mod

Exceptions

Debugging uncaught exceptions from bottoms or asynchronous exceptions is very similar to debugging segfaults with gdb.

λ: :set -fbreak-on-exception
λ: :trace main
λ: :hist
λ: :back

Bottoms

error :: String -> a
undefined :: a

The bottom is a singular value that inhabits every type. When evaluated the semantics of Haskell longer yield a meaningful value. It's usually written as the symbol ⊥ (i.e. the compiler flipping you off ).

An example of a infinite looping term:

f :: a
f = let x = x in x

The undefined function is nevertheless extremely practical to accommodate writing incomplete programs and for debugging. When combined with the editor tools like hdevtools it can also serve as an ad-hoc type-hole for debugging.

f :: a -> Complicated Type
f = undefined -- write tomorrow, typecheck today!

Partial functions from non-exhaustive pattern matching is probably the most common introduction of bottoms. GHC can be made more vocal about this using the -fwarn-incomplete-patterns and -fwarn-incomplete-uni-patterns flags.

data F = A | B
case x of 
  A -> ()

Is translated into the following GHC Core with the exception inserted for the non-exhaustive patterns.

case x of _ {
  A -> ();
  B -> patError "<interactive>:3:11-31|case"
}

The same holds with record construction with missing fields, although there's almost never a good reason to construct a record with missing fields and GHC will warn us.

data Foo = Foo { example1 :: Int }
f = Foo {}

Again this has an error term put in place by the compiler:

Foo (recConError "<interactive>:4:9-12|a")

What's not immediately apparent is that they are used extensively throughout the Prelude, some for practical reasons others for historical reasons. The canonical example is the head function which as written [a] -> a could not be well-typed without the bottom.

import GHC.Err
import Prelude hiding (head, (!!), undefined)

-- degenerate functions

undefined :: a
undefined =  error "Prelude.undefined"

head :: [a] -> a
head (x:_) =  x
head []    =  error "Prelude.head: empty list"

(!!) :: [a] -> Int -> a
xs     !! n | n < 0 =  error "Prelude.!!: negative index"
[]     !! _         =  error "Prelude.!!: index too large"
(x:_)  !! 0         =  x
(_:xs) !! n         =  xs !! (n-1)

It's rare to see these partial functions thrown around carelessly in production code and the preferred method is instead to use the safe variants provided in Data.Maybe combined with the usual fold functions maybe and either or to use pattern matching.

listToMaybe        :: [a] -> Maybe a
listToMaybe []     =  Nothing
listToMaybe (a:_)  =  Just a

Monads

Eightfold Path to Monad Satori

Much ink has been spilled waxing lyrical about the supposed mystique of monads. Instead I suggest a path to enlightenment:

  1. Don't read the monad tutorials.
  2. No really, don't read the monad tutorials.
  3. Learn about Haskell types.
  4. Learn what a typeclass is.
  5. Read the Typeclassopedia.
  6. Read the monad definitions.
  7. Use monads in real code.
  8. Don't write monad-analogy tutorials.

In other words, the only path to understanding monads is to read the fine source, fire up GHC and write some code. Analogies and metaphors will not lead to understanding.

See: Monad Tutorial Fallacy

Monadic Myths

The following are all false:

  • Monads are impure.
  • Monads are about effects.
  • Monads are about state.
  • Monads are about sequencing.
  • Monads are about IO.
  • Monads are dependent on laziness.
  • Monads are a "back-door" in the language to perform side-effects.
  • Monads are an embedded imperative language inside Haskell.
  • Monads require knowing abstract mathematics.

See: What a Monad Is Not

Laws

Monads are not complicated, the implementation is a typeclass with two functions, (>>=) pronounced "bind" and return. Any preconceptions one might have for the word "return" should be discarded, it has an entirely different meaning.

class Monad m where
  (>>=)  :: m a -> (a -> m b) -> m b
  return :: a -> m a

Together with three laws that all monad instances must satisfy.

Law 1

return a >>= f ≡ f a

Law 2

m >>= return ≡ m

Law 3

(m >>= f) >>= g ≡ m >>= (\x -> f x >>= g)

There is an auxiliary function defined in terms of the bind operation that discards its argument.

Also (>>) in terms of (>>=):

(>>) :: Monad m => m a -> m b -> m b
m >> k = m >>= \_ -> k

See: Monad Laws

Do Notation

Monads syntax in Haskell is written in sugared form that is entirely equivalent to just applications of the monad operations. The desugaring is defined recursively by the rules:

do { a <- f ; m }  ≡  f >>= \a -> m
do { f ; m }       ≡  f >> m

So for example:

do {
  a <- f ;
  b <- g ;
  c <- h ;
  return (a, b, c)
}

f >>= \a ->
  g >>= \b ->
    h >>= \c ->
      return (a, b, c)

In the do-notation the monad laws from above are equivalently written:

Law 1

  do x <- m
     return x

= do m

Law 2

  do y <- return x
     f y

= do f x

Law 3

  do b <- do a <- m
             f a
     g b

= do a <- m
     b <- f a
     g b

= do a <- m
     do b <- f a
        g b

See: Haskell 2010: Do Expressions

Maybe

The Maybe monad is the simplest first example of a monad instance.

data Maybe a = Just a | Nothing
instance Monad Maybe where
  (Just x) >>= k = k x
  Nothing  >>= k = Nothing

  return = Just
(Just 3) >>= (\x -> return (x + 1))
-- Just 4

Nothing >>= (\x -> return (x + 1))
-- Nothing

return 4 :: Maybe Int
-- Just 4
example1 :: Maybe Int
example1 = do
  a <- Just 3
  b <- Just 4
  return $ a + b
-- Just 7

example2 :: Maybe Int
example2 = do
  a <- Just 3
  b <- Nothing
  return $ a + b
-- Nothing

List

The List monad is the second simplest example of a monad instance.

instance Monad [] where
  m >>= f   =  concat (map f m)
  return x  =  [x]

So for example with:

m = [1,2,3,4]
f = \x -> [1,0]

The reduction is straightforward:

m >>= f
==> [1,2,3,4] >>= \x -> [1,0]
==> concat (map (\x -> [1,0]) [1,2,3,4])
==> concat ([[1,0],[1,0],[1,0],[1,0]])
==> [1,0,1,0,1,0,1,0]

The list comprehension syntax in Haskell can be implemented in terms of the list monad.

[(x,y) | x <- xs, y <- ys]
example :: [(Int, Int, Int)]
example = do
  a <- [1,2]
  b <- [10,20]
  c <- [100,200]
  return (a,b,c)
-- [(1,10,100),(1,10,200),(1,20,100),(1,20,200),(2,10,100),(2,10,200),(2,20,100),(2,20,200)]

IO

A value of type IO a is a computation which, when performed, does some I/O before returning a value of type a. Desugaring the IO monad:

main :: IO ()
main = do putStrLn "What is your name: "
          name <- getLine
          putStrLn name
main :: IO ()
main = putStrLn "What is your name:" >>=
       \_    -> getLine >>=
       \name -> putStrLn name
main :: IO ()
main = putStrLn "What is your name: " >> (getLine >>= (\name -> putStrLn name))

See: Haskell 2010: Basic/Input Output

Whats the point?

Consider the non-intuitive fact that we now have a uniform interface for talking about three very different but foundational ideas for programming: Failure, Collections, and Effects.

Let's write down a new function called sequence which folds a function mcons, which we can think of as analogues to the list constructor (i.e. (a : b : [])) except it pulls the two list elements out of of two monadic values (p,q) using bind.

sequence :: Monad m => [m a] -> m [a] 
sequence = foldr mcons (return [])

mcons :: Monad m => m t -> m t -> m [t]
mcons p q = do
  x <- p
  y <- q
  return [x,y]

What does this function mean in terms of each of the monads discussed above?

Maybe

Sequencing a list of a Maybe values allows us to collect the results of a series of computations which can possibly fail and yield the aggregated values only if they all succeeded.

sequence :: [Maybe a] -> Maybe [a]
sequence [Just 3, Just 4]
-- Just [3,4]
sequence [Just 3, Just 4, Nothing]
-- Nothing

List

Since the bind operation for the list monad forms the pairwise list of elements from the two operands, folding the bind over a list of lists with sequence implements the general Cartesian product for an arbitrary number of lists.

sequence :: [[a]] -> [[a]]
sequence [[1,2,3],[10,20,30]]
-- [[1,10],[1,20],[1,30],[2,10],[2,20],[2,30],[3,10],[3,20],[3,30]]

IO

Sequence takes a list of IO actions, performs them sequence tally, and returns the list of resulting values in the order sequenced.

sequence :: [IO a] -> IO [a]
sequence [getLine, getLine]
-- a
-- b
-- ["a","b"]

So there we have it, three fundamental concepts of computation that are normally defined independently of each other actually all share this similar structure that can be abstracted out and reused to build higher abstractions that work for all current and future implementations. If you want a motivating reason for understanding monads, this is it! This is the essence of what I wish I knew about monads looking back.

See: Control.Monad

mtl and transformers

Most of the everyday monads live in either the "mtl" or "transformers" libraries. Of interest to us are:

  • Reader
  • Writer
  • State

See: transformers

Reader Monad

ask :: Reader r a -> a
asks :: (r -> a) -> Reader r a
local :: (r -> b) -> Reader b a -> Reader r a
runReader :: Reader r a -> r -> a
import Control.Monad.Reader

data MyState = MyState
  { foo :: String
  , bar :: Int
  } deriving (Show)

computation :: Reader MyState (Maybe String)
computation = do
  n <- asks bar
  x <- asks foo
  if n > 0
    then return (Just x)
    else return Nothing

example1 :: Maybe String
example1 = runReader computation $ MyState "hello!" 1

example2 :: Maybe String
example2 = runReader computation $ MyState "example!" 0

A simple implementation of the Reader monad:

newtype Reader r a = Reader { runReader :: r -> a }

instance Monad (Reader r) where
  return a = Reader $ \_ -> a
  m >>= k  = Reader $ \r -> runReader (k (runReader m r)) r

ask :: Reader a a
ask = Reader id

asks :: (r -> a) -> Reader r a
asks f = Reader f

local :: (r -> b) -> Reader b a -> Reader r a
local f m = Reader $ runReader m . f

Writer Monad

tell :: w -> Writer w ()
execWriter :: Writer w a -> w
runWriter  :: Writer w a -> (a, w)
import Control.Monad.Writer

type MyWriter = Writer [Int] String

example :: MyWriter
example  = do
  tell [1..5]
  tell [5..10]
  return "foo"

output :: (String, [Int])
output = runWriter example

An simple implementation of the Writer monad:

import Data.Monoid

newtype Writer w a = Writer { runWriter :: (a, w) }

instance Monoid w => Monad (Writer w) where
  return a = Writer (a, mempty)
  m >>= k  = Writer $ let
      (a, w)  = runWriter m
      (b, w') = runWriter (k a)
      in (b, w `mappend` w')

execWriter :: Writer w a -> w
execWriter m = snd (runWriter m)

tell :: w -> Writer w ()
tell w = Writer ((), w)

This implementation is lazy so some care must be taken that one actually wants only generate a stream of thunks. Often this it is desirable to produce a computation which requires a stream of thunks that can pulled lazily out of the runWriter, but often times the requirement is to produce a finite stream of values that are forced at the invocation of runWriter. Undesired laziness from Writer is a common source of grief, but is very remediable.

State Monad

The state monad allows functions within a stateful context to access and modify pass state.

runState  :: State s a -> s -> (a, s)
evalState :: State s a -> s -> a
execState :: State s a -> s -> s
import Control.Monad.State

test :: State Int Int
test = do
  put 3
  modify (+1)
  get

main :: IO ()
main = print $ execState test 0

The state monad is often mistakingly as being impure, but it is in fact entirely pure and the same effect could be achieved by explicitly passing state. An simple implementation of the State monad is only a few lines:

newtype State s a = State { runState :: s -> (a,s) }

instance Monad (State s) where
  return a = State $ \s -> (a, s)

  State act >>= k = State $ \s ->
    let (a, s') = act s
    in runState (k a) s'

get :: State s s
get = State $ \s -> (s, s)

put :: s -> State s ()
put s = State $ \_ -> ((), s)

modify :: (s -> s) -> State s ()
modify f = get >>= \x -> put (f x)

evalState :: State s a -> s -> a
evalState act = fst . runState act

execState :: State s a -> s -> s
execState act = snd . runState act

Syntax Extensions

Pattern Guards

{-# LANGUAGE PatternGuards #-}

combine env x y
   | Just a <- lookup env x
   , Just b <- lookup env y
   = Just a + b

   | otherwise = Nothing

Multi-way if-expressions

{-# LANGUAGE MultiWayIf #-}

operation x =
  if | x > 100   = 3  
     | x > 10    = 2  
     | x > 1     = 1  
     | otherwise = 0

Lambda Case

{-# LANGUAGE LambdaCase #-}

data Exp a
  = Lam a (Exp a)
  | Var a
  | App (Exp a) (Exp a)

example :: Exp a -> a
example = \case
  Lam a b -> a
  Var a   -> a
  App a b -> example a

Laziness

Again, a subject on which much ink has been spilled. There is an ongoing discussion in the land of Haskell about the compromises between lazy and strict evaluation, and there are nuanced arguments for having either paradigm be the default. Haskell takes a hybrid approach and allows strict evaluation when needed and uses laziness by default. We can always find examples where lazy evaluation exhibits worse behavior than strict evaluation and vice versa. They both have flaws, and as of yet there isn't a method that combines only the best of both worlds.

See:

Seq and WHNF

In Haskell evaluation only occurs at outer constructor of case-statements in Core. If we pattern match on a list we don't implicitly force all values in the list. A element in the list is only evaluated when we scrutinize it's cons cell.

A term is said to be in weak head normal-form if the outermost constructor or lambda cannot be reduced further.

The seq function introduces an artificial dependence on the evaluation of order of two terms by requiring that the first argument be evaluated to WHNF before the evaluation of the second.

`seq` a = ⊥
a `seq` b = b

The famous foldl is well-known to leak space when used carelessly and without several compiler optimizations applied. The strict foldl' variant uses seq to overcome this.

foldl :: (a -> b -> a) -> a -> [b] -> a
foldl f z [] = z
foldl f z (x:xs) = foldl f (f z x) xs
foldl' :: (a -> b -> a) -> a -> [b] -> a
foldl' _ z [] = z
foldl' f z (x:xs) = let z' = f z x in z' `seq` foldl' f z' xs

The extension BangPatterns allows an alternative syntax to force arguments to functions to be wrapped in seq.

{-# LANGUAGE BangPatterns #-}

sum :: Num a => [a] -> a
sum = go 0
  where
    go !acc (x:xs) = go (acc + x) (go xs)
    go  acc []     = acc

Individual fields in records and arguments to constructors can also be explicitly annotated as strict

data A = A !Int

Deepseq

There are often times when for performance reasons we need to deeply evaluate a data structure to normal form leaving no terms unevaluated. The deepseq library performs this task.

class NFData a where
  rnf :: a -> ()

deepseq :: NFData a => a -> b -> a
[1, undefined] `seq` ()
-- ()

[1, undefined] `deepseq` ()
-- Prelude.undefined

Text / ByteString

The default Haskell string type is the rather naive list of characters that while perfectly fine for small identifiers is not well-suited for bulk processing.

type String = [Char]

With the -XOverloadedStrings extension string literals can be overloaded without the need for explicit packing.

class IsString a where
  fromString :: String -> a

For instance:

λ: :type "foo"
"foo" :: [Char]

λ: :set -XOverloadedStrings

λ: :type "foo"
"foo" :: IsString a => a

Text

A Text type is a packed blob of Unicode characters.

pack :: String -> Text
unpack :: Text -> String
{-# LANGUAGE OverloadedStrings #-}

import qualified Data.Text as T

-- From pack
myTStr1 :: T.Text
myTStr1 = T.pack ("foo" :: String)

-- From overloaded string literal.
myTStr2 :: T.Text
myTStr2 = "bar"

See: Text

ByteString

ByteStrings are arrays of unboxed chars with either strict or lazy evaluation.

pack :: String -> ByteString
unpack :: ByteString -> String
{-# LANGUAGE OverloadedStrings #-}

import qualified Data.ByteString.Char8 as B

-- From pack
myBStr1 :: B.ByteString
myBStr1 = B.pack ("foo" :: String)

-- From overloaded string literal.
myBStr2 :: B.ByteString
myBStr2 = "bar"

See: ByteString

Applicatives

Like monads Applicatives are an abstract structure for a wide class of computations.

pure :: Applicative f => a -> f a
(<$>) :: Functor f => (a -> b) -> f a -> f b
(<*>) :: f (a -> b) -> f a -> f b

As of GHC 7.6, Applicative is defined as:

class Functor f => Applicative f where
    pure :: a -> f a
    (<*>) :: f (a -> b) -> f a -> f b

(<$>) :: Functor f => (a -> b) -> f a -> f b
(<$>) = fmap

With the following laws:

pure id <*> v = v
pure f <*> pure x = pure (f x)
u <*> pure y = pure ($ y) <*> u
u <*> (v <*> w) = pure (.) <*> u <*> v <*> w

As a rule of thumb, whenever we would use m >>= return . f what we probably want is an applicative functor, and not a monad.

import Network.HTTP
import Control.Applicative ((<$>),(<*>))

example1 :: Maybe Integer
example1 = (+) <$> m1 <*> m2
  where
    m1 = Just 3
    m2 = Nothing
-- Nothing

example2 :: [(Int, Int, Int)]
example2 = (,,) <$> m1 <*> m2 <*> m3
  where
    m1 = [1,2]
    m2 = [10,20]
    m3 = [100,200]
-- [(1,10,100),(1,10,200),(1,20,100),(1,20,200),(2,10,100),(2,10,200),(2,20,100),(2,20,200)]

example3 :: IO String
example3 = (++) <$> fetch1 <*> fetch2
  where
    fetch1 = simpleHTTP (getRequest "http://www.fpcomplete.com/") >>= getResponseBody
    fetch2 = simpleHTTP (getRequest "http://www.haskell.org/") >>= getResponseBody

The pattern f <$> a <*> b ... shows us so frequently that there are a family of functions to lift applicatives of a fixed number arguments. This pattern also shows up frequently with monads (liftM, liftM2, liftM3).

liftA :: Applicative f => (a -> b) -> f a -> f b
liftA f a = pure f <*> a

liftA2 :: Applicative f => (a -> b -> c) -> f a -> f b -> f c
liftA2 f a b = f <$> a <*> b

liftA3 :: Applicative f => (a -> b -> c -> d) -> f a -> f b -> f c -> f d
liftA3 f a b c = f <$> a <*> b <*> c

...

See:

Monad + Functor Hierarchy

In principle every monad arises out of an applicative functor (and by corollary a functor) but due to historical reasons Applicative isn't a superclass of the Monad typeclass. A hypothetical fixed Prelude might have:

class Functor f where
  fmap :: (a -> b) -> f a -> f b
 
class Functor f => Applicative f where
  pure :: a -> f a
  (<*>) :: f (a -> b) -> f a -> f b
 
class Applicative m => Monad m where
  (>>=) :: m a -> (a -> m b) -> m b
  ma >>= f = join (fmap f ma)

return :: Applicative m => a -> m a
return = pure

join :: Monad m => m (m a) -> m a
join x = x >>= id

RWS Monad

The RWS monad is a combination of the three monads discussed above, the Reader, Writer, and State.

runReader :: Reader r a -> r -> a
runWriter :: Writer w a -> (a, w)
runState  :: State s a -> s -> (a, s)

These three eval functions are now combined into the following functions:

runRWS  :: RWS r w s a -> r -> s -> (a, s, w)
execRWS :: RWS r w s a -> r -> s -> (s, w)
evalRWS :: RWS r w s a -> r -> s -> (a, w)
import Control.Monad.RWS

type R = Int
type W = [Int]
type S = Int

computation :: RWS R W S ()
computation = do
  e <- ask
  a <- get
  let b = a + e
  put b
  tell [b]

example = runRWS computation 2 3

The usual caveat about Writer also applies to RWS.

Monad Transformers

Monad transformers allow us to nest monadic computations in a stack with an interface to exchange values between the levels, called lift. The Monad Transformer Library (mtl) is the common implementation.

lift :: (Monad m, MonadTrans t) => m a -> t m a
liftIO :: MonadIO m => IO a -> m a
class MonadTrans t where
    lift :: Monad m => m a -> t m a

class (Monad m) => MonadIO m where
    liftIO :: IO a -> m a

instance MonadIO IO where
    liftIO = id

It's useful to remember that transformers compose outside-in but are unrolled inside out.

See: Monad Transformers: Step-By-Step

Basics

The most basic use requires us to use the T-variants of the each of the monad transformers for the outer layers and to explicit lift and return values between each the layers.

import Control.Monad.Reader

data Expr
  = Val Int
  | Add Expr Expr
  | Var String
  deriving (Show)

type Env = [(String, Int)]
type Eval a = ReaderT Env Maybe a

eval :: Expr -> Eval Int
eval (Val n) = return n
eval (Add x y) = liftM2 (+) (eval x) (eval y)
eval (Var x) = do
  env <- ask
  val <- lift (lookup x env)
  return val


ex :: Eval Int
ex = eval (Add (Val 2) (Add (Val 1) (Var "x")))

env :: Env
env = [("x", 2), ("y", 5)]

example1, example2 :: Maybe Int
example1 = runReaderT ex env
example2 = runReaderT ex []

The fundamental limitation of this approach is that ourselves lift.lift.lifting and return.return.returning a lot.

Newtype Deriving

Newtypes let us reference existing types as a new distinct type, with no runtime overhead from boxing. Newtype wrappers around strings and numeric types can often reduce accidental errors. Using -XGeneralizedNewtypeDeriving we can recover the functionality of instances of the underlying type.

{-# LANGUAGE GeneralizedNewtypeDeriving #-}

newtype Velocity = Velocity { unVelocity :: Double }
  deriving (Eq, Ord)

v :: Velocity
v = Velocity 2.718

x :: Double
x = 6.636

err = v + x
Couldn't match type `Double' with `Velocity'
Expected type: Velocity
  Actual type: Position
In the second argument of `(+)', namely `x'
In the expression: v + x

Using newtype deriving with typeclasses we can produce flattened transformer types that don't require explicit lifting the transform stack. For example a little stack machine the Reader Writer and State monads.

{-# LANGUAGE GeneralizedNewtypeDeriving #-}

import Control.Monad.Reader
import Control.Monad.Writer
import Control.Monad.State

type Stack   = [Int]
type Output  = [Int]
type Program = [Instr]

type VM a = ReaderT Program (WriterT Output (State Stack)) a

newtype Comp a = Comp (VM a)
  deriving (Monad, MonadReader Program, MonadWriter Output, MonadState Stack)

data Instr = Push Int | Pop | Puts

evalInstr :: Instr -> VM ()
evalInstr instr = case instr of
  Pop    -> modify tail
  Push n -> modify (n:)
  Puts   -> do
    tos <- gets head
    tell [tos]

eval :: VM ()
eval = do
  instr <- ask
  case instr of
    []     -> return ()
    (i:is) -> evalInstr i >> local (const is) eval

execVM :: Program -> Output
execVM = flip evalState [] . execWriterT . runReaderT eval

program :: Program
program = [
     Push 42,
     Push 27,
     Puts,
     Pop,
     Puts,
     Pop
  ]

main :: IO ()
main = mapM_ print $ execVM program

Error Handling

Control.Exception

The low-level (and most dangerous) way to handle errors is to use the throw and catch functions which allow you to throw extensible extensions in pure code but catch the resulting exception within IO. Of specific note is that return value of the throw inhabits all types. There's no reason to use this for custom code that doesn't use low-level system operations.

throw :: Exception e => e -> a
catch :: Exception e => IO a -> (e -> IO a) -> IO a
try :: Exception e => IO a -> IO (Either e a)
evaluate :: a -> IO a
{-# LANGUAGE DeriveDataTypeable #-}

import Data.Typeable
import Control.Exception

data MyException = MyException
    deriving (Show, Typeable)

instance Exception MyException

evil :: [Int]
evil = [throw MyException]

example1 :: Int
example1 = head evil

example2 :: Int
example2 = length evil

main :: IO ()
main = do
  a <- try (evaluate example1) :: IO (Either MyException Int)
  print a

  b <- try (return example2) :: IO (Either MyException Int)
  print b

Exceptions

The problem with the previous approach is having to rely on GHC's asynchronous exception handling inside of IO to handle basic operations. The exceptions provides the same API as Control.Exception but loosens the dependency on IO.

{-# LANGUAGE DeriveDataTypeable #-}

import Data.Typeable
import Control.Monad.Catch
import Control.Monad.Identity

data MyException = MyException
    deriving (Show, Typeable)

instance Exception MyException

example :: MonadCatch m => Int -> Int -> m Int
example x y | y == 0 = throwM MyException
            | otherwise = return $ x `div` y

pure :: MonadCatch m => m (Either MyException Int)
pure = do
  a <- try (example 1 2)
  b <- try (example 1 0)
  return (a >> b)

See: exceptions

Either

The instance of the Either monad is simple, note the bias toward Left when binding.

instance Monad (Either e) where
  return x = Right x

  (Left x)  >>= f = Left x
  (Right x) >>= f = f x

The silly example one always sees is writing safe division function that fails out with a Left value when a division by zero happens and holds the resulting value in Right otherwise.

sdiv :: Double -> Double -> Either String Double
sdiv _ 0 = throwError "divide by zero"
sdiv i j = return $ i / j

example :: Double -> Double -> Either String Double
example n m = do
  a <- sdiv n m
  b <- sdiv 2 a
  c <- sdiv 2 b
  return c

throwError :: String -> Either String b
throwError a = Left a

main :: IO ()
main = do
  print $ example 1 5
  print $ example 1 0

This is admittedly pretty stupid but captures the essence of why Either/EitherT is an suitable monad for exception handling.

ErrorT

Another slightly clumsy method is to use the ErrorT transformer composed with an Identity and unrolling into an Either Exception a. This method is simple but doesn't compose well depending on the situation and interaction with IO.

import Control.Monad.Error
import Control.Monad.Identity

data Exception
  = Failure String
  | GenericFailure
  deriving Show

instance Error Exception where
  noMsg = GenericFailure

type ErrMonad a = ErrorT Exception Identity a

example :: Int -> Int -> ErrMonad Int
example x y = do
  case y of
    0 -> throwError $ Failure "division by zero"
    x -> return $ x `div` y

runFail :: ErrMonad a -> Either Exception a
runFail = runIdentity . runErrorT

example1 :: Either Exception Int
example1 = runFail $ example 2 3

example2 :: Either Exception Int
example2 = runFail $ example 2 0

EitherT and errors

newtype EitherT e m a = EitherT {runEitherT :: m (Either e a)}
    -- Defined in `Control.Monad.Trans.Either'
runEitherT :: EitherT e m a -> m (Either e a)
tryIO :: MonadIO m => IO a -> EitherT IOException m a

throwT  :: Monad m => e -> EitherT e m r
catchT  :: Monad m => EitherT a m r -> (a -> EitherT b m r) -> EitherT b m r
handleT :: Monad m => (a -> EitherT b m r) -> EitherT a m r -> EitherT b m

The ideal monad to use is simply the EitherT monad which we'd like to be able to use an with an API similar to ErrorT and be able to gracefully handle exceptions when underlying monad is IO. Nothing yet quite gives us all this out of the box.

For example suppose we wanted to use read to attempt to read a positive integer from stdin. There are two failure modes and two failure cases here, one for a parse error which fails with an error from Prelude.readIO and one for a non-positive integer which fails with a custom exception after a check. We'd like to be unify both cases in the same transformer.

Enter the safe and errors package which through a little re-export magic make life with EitherT more pleasant. The safe library provides a variety of safer variants of the standard prelude functions that handle failures as Maybe values, explicitly passed default values, or more informative exception "notes". While the errors library reexports the safe Maybe functions and hoists them up into the EitherT monad providing a family of try prefixed functions that perform actions and fail with an exception argument.

Safe.headMay :: [a] -> Maybe a
hoistEither :: Monad m => Either e a -> EitherT e m a
hoistMaybe :: Monad m => Maybe b -> MaybeT m b
-- Exception handling equivalent of `read`
tryRead :: (Monad m, Read a) => e -> String -> EitherT e m a

-- Exception handling equivelant of `head`
tryHead :: Monad m => e -> [a] -> EitherT e m a

-- Exception handling equivelant of `(!!)`
tryAt :: Monad m => e -> [a] -> Int -> EitherT e m a

Putting this all together we have pretty close to the ideal monad for error handling.

import Control.Error
import Control.Monad.Trans

data Failure
  = NonPositive Int
  | ReadError String
  deriving Show

main :: IO ()
main = do
  e <- runEitherT $ do
      liftIO $ putStrLn "Enter a positive number."
      s <- liftIO getLine
      n <- tryRead (ReadError s) s
      if n > 0
        then return $ n + 1
        else throwT $ NonPositive n

  case e of
      Left  n -> putStrLn $ "Failed with: " ++ show n
      Right s -> putStrLn $ "Succeeded with: " ++ show s

See:

ST Monad

The ST monad models "threads" of stateful computations which can manipulate mutable references but are restricted to only return pure values when evaluated and are statically confined to the ST monad of a s thread.

runST :: (forall s. ST s a) -> a
newSTRef :: a -> ST s (STRef s a)
readSTRef :: STRef s a -> ST s a
writeSTRef :: STRef s a -> a -> ST s ()
{-# LANGUAGE BangPatterns #-}

import Data.STRef
import Control.Monad
import Control.Monad.ST
import Control.Monad.State.Strict

example1 :: Int
example1 = runST $ do
  x <- newSTRef 0

  forM_ [1..1000] $ \j -> do
    writeSTRef x j

  readSTRef x

example2 :: Int
example2 = runST $ do
  count <- newSTRef 0
  replicateM_ (10^6) $ modifySTRef' count (+1)
  readSTRef count

example3 :: Int
example3 = flip evalState 0 $ do
  replicateM_ (10^6) $ modify' (+1)
  get

modify' :: MonadState a m => (a -> a) -> m ()
modify' f = get >>= (\x -> put $! f x)

Using the ST monad we can create a new class of efficient purely functional data structures that use mutable references.

Foldable / Traversable

If coming from an imperative background retraining one's self to think about iteration in terms of maps, folds, and scans can be challenging but the end result is better compositionally and code-reuse.

-- pseudocode
foldr f z [a...] = f a (f b ( ... (f y z) ... )) 
foldl f z [a...] = f ... (f (f z a) b) ... y 

A foldable instance allows us to apply functions to data types of monoidal values that collapse the structure using some logic over mappend.

A traversable instance allows us to apply functions to data types that walk the structure left-to-right within an applicative context.

class (Functor f, Foldable f) => Traversable f where
  traverse :: Applicative g => f (g a) -> g (f a)

class Foldable f where
  foldMap :: Monoid m => (a -> m) -> f a -> m

The names exported by foldable quite often conflict with ones defined in the Prelude, either import them qualified or just disable the Prelude. The operations in the Foldable all specialize to the same behave the same as the ones Prelude for List types.

import Data.Monoid
import Data.Foldable
import Data.Traversable

import Control.Applicative
import Control.Monad.Identity (runIdentity)
import Prelude hiding (mapM_, foldr)

-- Rose Tree
data Tree a = Node a [Tree a] deriving (Show)

instance Functor Tree where
  fmap f (Node x ts) = Node (f x) (fmap (fmap f) ts)

instance Traversable Tree where
  traverse f (Node x ts) = Node <$> f x <*> traverse (traverse f) ts

instance Foldable Tree where
  foldMap f (Node x ts) = f x `mappend` foldMap (foldMap f) ts


tree :: Tree Integer
tree = Node 1 [Node 1 [], Node 2 [] ,Node 3 []]


example1 :: IO ()
example1 = mapM_ print tree

example2 :: Integer
example2 = foldr (+) 0 tree

example3 :: Maybe (Tree Integer)
example3 = traverse (\x -> if x > 2 then Just x else Nothing) tree

example4 :: Tree Integer
example4 = runIdentity $ traverse (\x -> pure (x+1)) tree

The instances we defined above can also be automatically derived by GHC using several language extensions, the results are identical to the hand-written versions above.

{-# LANGUAGE DeriveFunctor #-}
{-# LANGUAGE DeriveFoldable #-}
{-# LANGUAGE DeriveTraversable #-}

data Tree a = Node a [Tree a]
  deriving (Show, Functor, Foldable, Traversable)

Prelude Legacy

The instances of Foldable for the list type often conflict with the monomorphic versiosn in the Prelude which are left in for historical reasons. So often times it is desirable to explicitly mask these functions from implicit import and force the use of Foldable and Traversable instead:

import  Data.List hiding ( 
    all , and , any , concat , concatMap , elem , filter ,
    find , foldl , foldl' , foldl1 , foldr , foldr1 ,
    mapAccumL , mapAccumR , maximum , maximumBy , minimum , 
    minimumBy , notElem , or , product , sum )

import Control.Monad hiding ( 
    forM , forM_ , mapM , mapM_ , msum , sequence , sequence_ )

The nuclear option is to exclude the entire prelude except by explicit qualified use.

import qualified Prelude as P

Free Monads

Pure :: a -> Free f a
Free :: f (Free f a) -> Free f a

liftF :: (Functor f, MonadFree f m) => f a -> m a
retract :: Monad f => Free f a -> f a

Free monads are monads which instead of having a join operation that combines computations, instead forms composite computations from application of a functor.

join :: Monad m => m (m a) -> m a
wrap :: MonadFree f m => f (m a) -> m a

One of the best examples is the Partiality monad which models computations which can diverge. Haskell allows unbounded recursion, but for example we can create a free monad from the Maybe functor which when can be used to fix the call-depth of, for example, the Ackermann functions

import Control.Monad.Fix
import Control.Monad.Free

type Partiality a = Free Maybe a

-- Non-termination.
never :: Partiality a
never = fix (Free . Just)

fromMaybe :: Maybe a -> Partiality a
fromMaybe (Just x) = Pure x
fromMaybe Nothing = Free Nothing

runPartiality :: Int -> Partiality a -> Maybe a
runPartiality 0 _ = Nothing
runPartiality _ (Pure a) = Just a
runPartiality _ (Free Nothing) = Nothing
runPartiality n (Free (Just a)) = runPartiality (n-1) a

ack :: Int -> Int -> Partiality Int
ack 0 n = Pure $ n + 1
ack m 0 = Free $ Just $ ack (m-1) 1
ack m n = Free $ Just $ ack m (n-1) >>= ack (m-1)

main :: IO ()
main = do
  let diverge = never :: Partiality ()
  print $ runPartiality 1000 diverge
  print $ runPartiality 1000 (ack 3 4)
  print $ runPartiality 5500 (ack 3 4)

The other common use for free monads to build embedded domain languages to describe computations. We can model a subset of the IO monad by building up a pure description of the computation inside of the IOFree monad and then using the free monad to encode the translation to an effectful IO computation.

{-# LANGUAGE DeriveFunctor #-}

import System.Exit
import Control.Monad.Free

data Interaction x
  = Puts String x
  | Gets (Char -> x)
  | Exit
  deriving Functor

type IOFree a = Free Interaction a

puts :: String -> IOFree ()
puts s = liftF $ Puts s ()

get :: IOFree Char
get = liftF $ Gets id

exit :: IOFree r
exit = liftF Exit

gets :: IOFree String
gets = do
  c <- get
  if c == '\n'
    then return ""
    else gets >>= \line -> return (c : line)

-- Collapse our IOFree DSL into IO monad actions.
interp :: IOFree a -> IO a
interp (Pure r) = return r
interp (Free x) = case x of
  Puts s t -> putStrLn s >> interp t
  Gets f   -> getChar >>= interp . f
  Exit     -> exitSuccess

echo :: IOFree ()
echo = do
  puts "Enter your name:"
  str <- gets
  puts str
  if length str > 10
    then puts "You have a long name."
    else puts "You have a short name."
  exit

main :: IO ()
main = interp echo

An implementation such as the one found in free might look like the following:

{-# LANGUAGE MultiParamTypeClasses #-}

import Control.Applicative

data Free f a
  = Pure a
  | Free (f (Free f a))

instance Functor f => Monad (Free f) where
  return a     = Pure a
  Pure a >>= f = f a
  Free f >>= g = Free (fmap (>>= g) f)

class Monad m => MonadFree f m  where
  wrap :: f (m a) -> m a

liftF :: (Functor f, MonadFree f m) => f a -> m a
liftF = wrap . fmap return

iter :: Functor f => (f a -> a) -> Free f a -> a
iter _ (Pure a) = a
iter phi (Free m) = phi (iter phi <$> m)

retract :: Monad f => Free f a -> f a
retract (Pure a) = return a
retract (Free as) = as >>= retract

See:

GADTs

GADTs are an extension to algebraic datatypes that allow us to qualify the constructors to datatypes with type equality constraints, allowing a class of types that are not expressible using vanilla ADTs.

For consider the data type, we have a term in which we Succ which takes a Term parameterized by a which span all types. Problems arise between the clash whether (a ~ Bool) or (a ~ Int) when trying to write the evaluator.

data Term a
  = Lit a
  | Succ (Term a)
  | IsZero (Term a)

eval (Lit i)      = i
eval (Succ t)     = 1 + eval t
eval (IsZero i)   = eval i == 0

And we have:

-- This is a valid type.
failure = Succ ( Lit True )

Using a GADT we can express the type invariants for our language (i.e. only type-safe expressions are representable). Pattern matching on this GADTs then carries type equality constraints without the need for explicit tags.

{-# Language GADTs #-}

data Term a where
  Lit    :: a -> Term a
  Succ   :: Term Int -> Term Int
  IsZero :: Term Int -> Term Bool
  If     :: Term Bool -> Term a -> Term a -> Term a

eval :: Term a -> a
eval (Lit i)      = i                                   -- Term a
eval (Succ t)     = 1 + eval t                          -- Term (a ~ Int)
eval (IsZero i)   = eval i == 0                         -- Term (a ~ Int)
eval (If b e1 e2) = if eval b then eval e1 else eval e2 -- Term (a ~ Bool)

example :: Int
example = eval (Succ (Succ (Lit 3)))

This time around:

-- This is rejected at compile-time.
failure = Succ ( Lit True )

Explicit equality constraints can be added to a function's context with -XGADTs enabled.

f :: (a ~ b) => a -> b -> (a,b)
f x y = (x,y)

Kind Signatures

Recall that the kind in Haskell's type system the "type of the types" or kinds is the type system consisting the single kind * and an arrow kind ->.

κ : *
  | κ -> κ 
Int :: *
Maybe :: * -> *
Either :: * -> * -> *

On top of default GADT declaration we can also constrain the parameters of the GADT to specific kinds. For basic usage Haskell's kind inference can deduce this reasonably well, but combined with some other type system extensions this becomes essential.

{-# Language GADTs #-}
{-# LANGUAGE KindSignatures #-}

data Term a :: * where
  Lit    :: a -> Term a
  Succ   :: Term Int -> Term Int
  IsZero :: Term Int -> Term Bool
  If     :: Term Bool -> Term a -> Term a -> Term a

data Vec :: * -> * -> * where
  Nil :: Vec n a
  Cons :: a -> Vec n a -> Vec n a

data Fix :: (* -> *) -> * where
  In :: f (Fix f) -> Fix f

Equality

With a richer language for datatypes we can express terms that witness the relationship between terms in the constructors, for example equality:

{-# LANGUAGE GADTs #-}

data Eql a b where
  Refl :: Eql a a

sym :: Eql a b -> Eql b a
sym Refl = Refl

cong :: Eql a b -> Eql (f a) (f b)
cong Refl = Refl

trans :: Eql a b -> Eql b c -> Eql a c
trans Refl Refl = Refl

cast :: Eql a b -> a -> b
cast Refl = id

HOAS

Higher Order Abstract Syntax (HOAS) is a technique for encoding the lambda calculus that exploits the function type of the host language ( i.e. Haskell ) to give us capture-avoiding substitution in our custom language by exploiting Haskell's implementation.

{-# LANGUAGE GADTs #-}

data Expr a where
  Con :: a -> Expr a
  Lam :: (Expr a -> Expr b) -> Expr (a -> b)
  App :: Expr (a -> b) -> Expr a -> Expr b

id :: Expr (a -> a)
id = Lam (\x -> x)

tr :: Expr (a -> b -> a)
tr = Lam (\x -> (Lam (\y -> x)))

fl :: Expr (a -> b -> b)
fl = Lam (\x -> (Lam (\y -> y)))

eval :: Expr a -> a
eval (Con v) = v
eval (Lam f) = \x -> eval (f (Con x))
eval (App e1 e2) = (eval e1) (eval e2)

There is no however no safeguard preventing us from lifting Haskell functions which do not encode meaningful lambda calculus expression. For example:

Lam (\x -> let x = x in x )

Pretty printing HOAS encoded terms can also be quite complicated since the body of the function is under a Haskell lambda binder.

Final Interpreters

Using typeclasses we can implement a final interpreter which models a set of extensible terms using functions bound to typeclasses rather than data constructors. Instances of the typeclass form interpreters over these terms.

For example we can write a small language that includes basic arithmetic, and then retroactively extend our expression language with a multiplication operator without changing the base. At the same time our logic for our evaluator and pretty-printing interpreters remain invariant under the addition of new elements.

{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE TypeSynonymInstances #-}
{-# LANGUAGE NoMonomorphismRestriction #-}

class Expr repr where
  lit :: Int -> repr
  neg :: repr -> repr
  add :: repr -> repr -> repr
  mul :: repr -> repr -> repr

instance Expr Int where
  lit n = n
  neg a = -a
  add a b = a + b
  mul a b = a * b

instance Expr String where
  lit n = show n
  neg a = "(-" ++ a ++ ")"
  add a b = "(" ++ a ++ " + " ++ b ++ ")"
  mul a b = "(" ++ a ++ " * " ++ b ++ ")"

class BoolExpr repr where
  eq :: repr -> repr -> repr
  tr :: repr
  fl :: repr

instance BoolExpr Int where
  eq a b = if a == b then tr else fl
  tr = 1
  fl = 0

instance BoolExpr String where
  eq a b = "(" ++ a ++ " == " ++ b ++ ")"
  tr = "true"
  fl = "false"

eval :: Int -> Int
eval = id

render :: String -> String
render = id

expr :: (BoolExpr repr, Expr repr) => repr
expr = eq (add (lit 1) (lit 2)) (lit 3)

result :: Int
result = eval expr
-- 1

string :: String
string = render expr
-- "((1 + 2) == 3)"

Typed Tagless Final Interpreters

Writing an evaluator for the lambda calculus can likewise also be modeled with a final interpeter and a Identity functor.

{-# LANGUAGE NoMonomorphismRestriction #-}

import Prelude hiding (id)

class Expr rep where
  lam :: (rep a -> rep b) -> rep (a -> b)
  app :: rep (a -> b) -> (rep a -> rep b)
  lit :: a -> rep a

newtype Interpret a = R { reify :: a }

instance Expr Interpret where
  lam f   = R $ reify . f . R
  app f a = R $ reify f $ reify a
  lit     = R

eval :: Interpret a -> a
eval e = reify e


e1 :: Expr rep => rep Int
e1 = app (lam (\x -> x)) (lit 3)

e2 :: Expr rep => rep Int
e2 = app (lam (\x -> lit 4)) (lam $ \x -> lam $ \y -> y)

example1 :: Int
example1 = eval e1
-- 3

example2 :: Int
example2 = eval e2
-- 4

The esqueleto library uses this approach internally to build a embedded domain language for describing SQL queries.

See:

Initial Algebras

The initial algebra approach differs from the final interpreter approach in that we now represent our terms as algebraic datatypes and the interpreter implements recursion and evaluation occurs through pattern matching.

Algebra of Datatypes

The usual hand-wavy of describing algebraic datatypes is to indicate the how natural correspondence between sum types, product types, and polynomial expressions arises.

data Void                       0
data Unit     = Unit            1
data Sum a b  = Inl a | Inr b   a + b
data Prod a b = Prod a b        a * b
type (->) a b = a -> b          b ^ a
data Maybe a = Nothing | Just a
Maybe = 1 + x

Recursive types are modeled as the infinite series of these terms.

data List a = Nil | Cons a (List a)
List a = μ a. 1 + a * (List a) 
       = 1 + a + a^2 + a^3 + a^4 ...

See:

F-Algebras

type Algebra f a = f a -> a
type Coalgebra f a = a -> f a
newtype Fix f = Fix { unFix :: f (Fix f) }

cata :: Functor f => Algebra f a -> Fix f -> a
ana  :: Functor f => Coalgebra f a -> a -> Fix f
hylo :: Functor f => Algebra f b -> Coalgebra f a -> a -> b

In Haskell an F-algebra in a functor f a together with function f a -> a. A colagebra reverses the function. For a functor f we can form it's recursive unrolling using the Fix newtype wrapper.

Fix f = f (f (f (f (f (f ( ... ))))))
newtype Fix f = Fix { unFix :: f (Fix f) }

Fix :: f (Fix f) -> Fix f
unFix :: Fix f -> f (Fix f)

In this form we can write down a generalized fold/unfold function that are datatype generic and written purely in terms of the recursing under the functor.

cata :: Functor f => Algebra f a -> Fix f -> a
cata alg = alg . fmap (cata alg) . unFix

ana :: Functor f => Coalgebra f a -> a -> Fix f
ana coalg = Fix . fmap (ana coalg) . coalg

We call these functions catamorphisms and anamorphisms. Notice especially that the types of thees two functions simply reverse the direction of arrows. Interpreted in another way they transform an algebra/colaglebra which defines a flat structure-preserving mapping between Fix f f into a function which either rolls or unrolls the fixpoint. What is particularly nice about this approach is that the recursion is abstracted away inside the functor definition and we are free to just implement the flat transformation logic!

For example a construction of the natural numbers in this form:

{-# LANGUAGE TypeOperators #-}
{-# LANGUAGE DeriveFunctor #-}
{-# LANGUAGE StandaloneDeriving #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE UndecidableInstances #-}

type Algebra f a = f a -> a
type Coalgebra f a = a -> f a

newtype Fix f = Fix { unFix :: f (Fix f) }

cata :: Functor f => Algebra f a -> Fix f -> a
cata alg = alg . fmap (cata alg) . unFix

ana :: Functor f => Coalgebra f a -> a -> Fix f
ana coalg = Fix . fmap (ana coalg) . coalg

hylo :: Functor f => Algebra f b -> Coalgebra f a -> a -> b
hylo f g = cata f . ana g

type Nat = Fix NatF
data NatF a = S a | Z deriving (Eq,Show)

instance Functor NatF where
  fmap f Z     = Z
  fmap f (S x) = S (f x)

plus :: Nat -> Nat -> Nat
plus n = cata phi where
  phi Z     = n
  phi (S m) = s m

times :: Nat -> Nat -> Nat
times n = cata phi where
  phi Z     = z
  phi (S m) = plus n m

int :: Nat -> Int
int = cata phi where
  phi  Z    = 0
  phi (S f) = 1 + f

nat :: Integer -> Nat
nat = ana (psi Z S) where
  psi f _ 0 = f
  psi _ f n = f (n-1)

z :: Nat
z = Fix Z

s :: Nat -> Nat
s = Fix . S


type Str = Fix StrF
data StrF x = Cons Char x | Nil

instance Functor StrF where
  fmap f (Cons a as) = Cons a (f as)
  fmap f Nil = Nil

nil :: Str
nil = Fix Nil

cons :: Char -> Str -> Str
cons x xs = Fix (Cons x xs)

str :: Str -> String
str = cata phi where
  phi Nil         = []
  phi (Cons x xs) = x : xs

str' :: String -> Str
str' = ana (psi Nil Cons) where
  psi f _ [] = f
  psi _ f (a:as) = f a as

example1 :: Int
example1 = int (plus (nat 125) (nat 25))
-- 150

Or for example an interpreter for a small expression language that depends on a scoping dictionary.

{-# LANGUAGE GADTs #-}
{-# LANGUAGE DeriveFunctor #-}
{-# LANGUAGE StandaloneDeriving #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE UndecidableInstances #-}

import Control.Applicative
import qualified Data.Map as M

type Algebra f a = f a -> a
type Coalgebra f a = a -> f a

newtype Fix f = Fix { unFix :: f (Fix f) }

cata :: Functor f => Algebra f a -> Fix f -> a
cata alg = alg . fmap (cata alg) . unFix

ana :: Functor f => Coalgebra f a -> a -> Fix f
ana coalg = Fix . fmap (ana coalg) . coalg

hylo :: Functor f => Algebra f b -> Coalgebra f a -> a -> b
hylo f g = cata f . ana g

type Id = String
type Env = M.Map Id Int

type Expr = Fix ExprF
data ExprF a
  = Lit Int
  | Var Id
  | Add a a
  | Mul a a
  deriving (Show, Eq, Ord, Functor)

deriving instance Eq (f (Fix f)) => Eq (Fix f)
deriving instance Ord (f (Fix f)) => Ord (Fix f)
deriving instance Show (f (Fix f)) => Show (Fix f)

eval :: M.Map Id Int -> Fix ExprF -> Maybe Int
eval env = cata phi where
  phi ex = case ex of
    Lit c   -> pure c
    Var i   -> M.lookup i env
    Add x y -> liftA2 (+) x y
    Mul x y -> liftA2 (*) x y

expr :: Expr
expr = Fix (Mul n (Fix (Add x y)))
  where
    n = Fix (Lit 10)
    x = Fix (Var "x")
    y = Fix (Var "y")

env :: M.Map Id Int
env = M.fromList [("x", 1), ("y", 2)]

compose :: (f (Fix f) -> c) -> (a -> Fix f) -> a -> c
compose x y = x . unFix . y

example :: Maybe Int
example = eval env expr
-- Just 30

What's especially nice about this approach is how naturally catamorphisms compose into efficient transformations lending itself to write extensible interpreters.

compose :: Functor f => (f (Fix f) -> c) -> (a -> Fix f) -> a -> c
compose f g = f . unFix . g

See:

QuickCheck

Probably the most famous Haskell library, QuickCheck is a testing framework for generating large random tests for arbitrary functions automatically based on the types of their arguments.

quickCheck :: Testable prop => prop -> IO ()
(==>) :: Testable prop => Bool -> prop -> Property
forAll :: (Show a, Testable prop) => Gen a -> (a -> prop) -> Property
choose :: Random a => (a, a) -> Gen a
import Test.QuickCheck

qsort :: [Int] -> [Int]
qsort []     = []
qsort (x:xs) = qsort lhs ++ [x] ++ qsort rhs
    where lhs = filter  (< x) xs
          rhs = filter (>= x) xs

prop_maximum ::  [Int] -> Property
prop_maximum xs = not (null xs) ==>
                  last (qsort xs) == maximum xs

main :: IO ()
main = quickCheck prop_maximum
$ runhaskell qcheck.hs
*** Failed! Falsifiable (after 3 tests and 4 shrinks):    
[0]
[1]

$ runhaskell qcheck.hs
+++ OK, passed 1000 tests.

The test data generator can be extended with custom types and refined with predicates that restrict the domain of cases to test.

import Test.QuickCheck

data Color = Red | Green | Blue deriving Show

instance Arbitrary Color where
  arbitrary = do
    n <- choose (0,2) :: Gen Int
    return $ case n of
      0 -> Red
      1 -> Green
      2 -> Blue

example1 :: IO [Color]
example1 = sample' arbitrary
-- [Red,Green,Red,Blue,Red,Red,Red,Blue,Green,Red,Red]

See:

SmallCheck

Like QuickCheck, SmallCheck is a property testing system but instead of producing random arbitrary test data it instead enumerates a deterministic series of test data to a fixed depth.

smallCheck :: Testable IO a => Depth -> a -> IO ()
list :: Depth -> Series Identity a -> [a]
sample' :: Gen a -> IO [a]
λ: list 3 series :: [Int]
[0,1,-1,2,-2,3,-3]

λ: list 3 series :: [Double]
[0.0,1.0,-1.0,2.0,0.5,-2.0,4.0,0.25,-0.5,-4.0,-0.25]

λ: list 3 series :: [(Int, String)]
[(0,""),(1,""),(0,"a"),(-1,""),(0,"b"),(1,"a"),(2,""),(1,"b"),(-1,"a"),(-2,""),(-1,"b"),(2,"a"),(-2,"a"),(2,"b"),(-2,"b")]

It is useful for all possible inputs of a program up to some depth.

import Test.SmallCheck

distrib :: Int -> Int -> Int -> Bool
distrib a b c = a * (b + c) == a * b + a * c

cauchy :: [Double] -> [Double] -> Bool
cauchy xs ys = (abs (dot xs ys))^2 <= (dot xs xs) * (dot ys ys)

failure :: [Double] -> [Double] -> Bool
failure xs ys = abs (dot xs ys) <= (dot xs xs) * (dot ys ys)

dot :: Num a => [a] -> [a] -> a
dot xs ys = sum (zipWith (*) xs ys)

main :: IO ()
main = do
  putStrLn "Testing distributivity..."
  smallCheck 25 distrib

  putStrLn "Testing Cauchy-Schwarz..."
  smallCheck 4 cauchy

  putStrLn "Testing invalid Cauchy-Schwarz..."
  smallCheck 4 failure
$ runhaskell smallcheck.hs
Testing distributivity...
Completed 132651 tests without failure.

Testing Cauchy-Schwarz...
Completed 27556 tests without failure.

Testing invalid Cauchy-Schwarz...
Failed test no. 349.
there exist [1.0] [0.5] such that
  condition is false

Series

Just like for QuickCheck we can implement series instances for our custom datatypes. For example there is no default instance for Vector, so let's implement one:

{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE MultiParamTypeClasses #-}

import Test.SmallCheck
import Test.SmallCheck.Series
import Control.Applicative

import qualified Data.Vector as V

dot :: Num a => V.Vector a -> V.Vector a -> a
dot xs ys = V.sum (V.zipWith (*) xs ys)

cauchy :: V.Vector Double -> V.Vector Double -> Bool
cauchy xs ys = (abs (dot xs ys))^2 <= (dot xs xs) * (dot ys ys)

instance (Serial m a, Monad m) => Serial m (V.Vector a) where
  series = V.fromList <$> series

main :: IO ()
main = smallCheck 4 cauchy

SmallCheck can also use Generics to derive Serial instances, for example to enumerate all trees of a certain depth we might use:

{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE MultiParamTypeClasses #-}
{-# LANGUAGE DeriveGeneric #-}

import GHC.Generics
import Test.SmallCheck.Series

data Tree a = Null | Fork (Tree a) a (Tree a)
    deriving (Show, Generic)

instance Serial m a => Serial m (Tree a)

example :: [Tree ()]
example = list 3 series

main = print example

Printer Combinators

Pretty printer combinators compose logic to print strings.

Combinators
<> Concatenation
<+> Spaced concatenation
char Renders a character as a Doc
text Renders a string as a Doc
{-# LANGUAGE FlexibleInstances #-}

import Text.PrettyPrint
import Text.Show.Pretty (ppShow)

parensIf ::  Bool -> Doc -> Doc
parensIf True = parens
parensIf False = id

type Name = String

data Expr
  = Var String
  | Lit Ground
  | App Expr Expr
  | Lam Name Expr
  deriving (Eq, Show)

data Ground
  = LInt Int
  | LBool Bool
  deriving (Show, Eq, Ord)


class Pretty p where
  ppr :: Int -> p -> Doc

instance Pretty String where
  ppr _ x = text x

instance Pretty Expr where
  ppr _ (Var x)         = text x
  ppr _ (Lit (LInt a))  = text (show a)
  ppr _ (Lit (LBool b)) = text (show b)

  ppr p e@(App _ _) =
    let (f, xs) = viewApp e in
    let args = sep $ map (ppr (p+1)) xs in
    parensIf (p>0) $ ppr p f <+> args

  ppr p e@(Lam _ _) =
    let body = ppr (p+1) (viewBody e) in
    let vars = map (ppr 0) (viewVars e) in
    parensIf (p>0) $ char '\\' <> hsep vars <+> text "." <+> body

viewVars :: Expr -> [Name]
viewVars (Lam n a) = n : viewVars a
viewVars _ = []

viewBody :: Expr -> Expr
viewBody (Lam _ a) = viewBody a
viewBody x = x

viewApp :: Expr -> (Expr, [Expr])
viewApp (App e1 e2) = go e1 [e2]
  where
    go (App a b) xs = go a (b : xs)
    go f xs = (f, xs)

ppexpr :: Expr -> String
ppexpr = render . ppr 0


s, k, example :: Expr
s = Lam "f" (Lam "g" (Lam "x" (App (Var "f") (App (Var "g") (Var "x")))))
k = Lam "x" (Lam "y" (Var "x"))
example = App s k

main :: IO ()
main = do
  putStrLn $ ppexpr s
  putStrLn $ ppShow example

The pretty printed form of the k combinator:

\f g x . (f (g x))

The Text.Show.Pretty library can be used to pretty print nested data structures in a more human readable form for any type that implements Show. For example a dump of the structure for the AST of SK combinator with ppShow.

App
  (Lam
     "f" (Lam "g" (Lam "x" (App (Var "f") (App (Var "g") (Var "x"))))))
  (Lam "x" (Lam "y" (Var "x")))

See:

Vector

Vectors are high performance single dimensional arrays that come come in six variants, two for each of the following types of a mutable and an immutable variant.

  • Data.Vector
  • Data.Vector.Storable
  • Data.Vector.Unboxed

The most notable feature of vectors is constant time memory access with ((!)) as well as variety of efficient map, fold and scan operations on top of a fusion framework that generates surprisingly optimal code.

fromList :: [a] -> Vector a
toList :: Vector a -> [a]
(!) :: Vector a -> Int -> a
map :: (a -> b) -> Vector a -> Vector b
foldl :: (a -> b -> a) -> a -> Vector b -> a
scanl :: (a -> b -> a) -> a -> Vector b -> Vector a
zipWith :: (a -> b -> c) -> Vector a -> Vector b -> Vector c
iterateN :: Int -> (a -> a) -> a -> Vector a
import Data.Vector.Unboxed as V

norm ::  Vector Double -> Double
norm = sqrt . V.sum . V.map (\x -> x*x)

example1 :: Double
example1 = norm $ V.iterateN 100000000 (+1) 0.0

See:

Mutable

freeze :: MVector (PrimState m) a -> m (Vector a)
thaw :: Vector a -> MVector (PrimState m) a

Within the IO monad we can perform arbitrary read and writes on the mutable vector with constant time reads and writes. When needed a static Vector can be created to/from the MVector using the freeze/thaw functions.

import GHC.Prim
import Control.Monad
import Data.Vector.Unboxed (freeze)
import Data.Vector.Unboxed.Mutable

example :: IO (MVector RealWorld Int)
example = do
  v <- new 10
  forM_ [0..9] $ \i ->
     write v i (2*i)
  return v

main :: IO ()
main = example >>= freeze >>= print

Type Families

Multi-parameter type classes

Resolution of vanilla Haskell 98 typeclasses proceeds via very simple context reduction that minimizes interdependency between predicates, resolves superclases, and reduces the types to head normal form. For example:

(Eq [a], Ord [a]) => [a]
==> Ord a => [a]

If a single parameter typeclass expresses a property of a type ( i.e. it's in a class or not in class ) then a multiparamater typeclass expresses relationships between types. For example whether if we wanted to express the relation a type can be converted to another type we might use a class like:

{-# LANGUAGE MultiParamTypeClasses #-}

import Data.Char

class Convertible a b where
  convert :: a -> b

instance Convertible Int Integer where
  convert = toInteger

instance Convertible Int Char where
  convert = chr

instance Convertible Char Int where
  convert = ord

Of course now our instances for Convertible Int are not unique anymore, so there no longer exists a nice procedure for determining the inferred type of b from just a. To remedy this let's add a functional dependency a -> b, which says tells GHC that an instance a uniquely determines the instance that b can be. So we'll see that our two instances relating Int to both Integer and Char conflict.

{-# LANGUAGE MultiParamTypeClasses #-}
{-# LANGUAGE FunctionalDependencies #-}


import Data.Char

class Convertible a b | a -> b where
  convert :: a -> b

instance Convertible Int Char where
  convert = chr

instance Convertible Char Int where
  convert = ord
Functional dependencies conflict between instance declarations:
  instance Convertible Int Integer
  instance Convertible Int Char

Now there's a simpler procedure for determening instances uniquely and multiparameter typeclasses become more usable and inferable again.

λ: convert (42 :: Int)
'42'
λ: convert '*'
42

Now let's make things not so simple. Turning on UndecidableInstances loosens the constraint on context reduction can only allow constraints of the class to become structural smaller than it's head. As a result implicit computation can now occur within in the type class instance search. Combined with a type-level representation of Peano numbers we find that we can encode basic arithmetic at the type-level.

{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE MultiParamTypeClasses #-}
{-# LANGUAGE FunctionalDependencies #-}
{-# LANGUAGE UndecidableInstances #-}

data Z
data S n

type Zero  = Z
type One   = S Zero
type Two   = S One
type Three = S Two
type Four  = S Three

zero :: Zero
zero = undefined

one :: One
one = undefined

two :: Two
two = undefined

three :: Three
three = undefined

four :: Four
four = undefined

class Eval a where
  eval :: a -> Int

instance Eval Zero where
  eval _ = 0

instance Eval n => Eval (S n) where
  eval m = 1 + eval (prev m)

class Pred a b | a -> b where
  prev :: a -> b

instance Pred Zero Zero where
  prev = undefined

instance Pred (S n) n where
  prev = undefined

class Add a b c | a b -> c where
  add :: a -> b -> c

instance Add Zero a a where
  add = undefined

instance Add a b c => Add (S a) b (S c) where
  add = undefined

f :: Three
f = add one two

g :: S (S (S (S Z)))
g = add two two

h :: Int
h = eval (add three four)

If the typeclass contexts look similar to Prolog you're not wrong, if you read the contexts qualifier (=>) backwards as backwards turnstiles :- then it's precisely the same equations.

add(0, A, A).
add(s(A), B, s(C)) :- add(A, B, C).

pred(0, 0).
pred(S(A), A).

This is kind of abusing typeclasses and if used carelessly it can fail to terminate or overflow at compile-time. UndecidableInstances shouldn't be turned on without careful forethought about what it implies.

<interactive>:1:1:
    Context reduction stack overflow; size = 201

Type Families

Type famiiles allows us to write families functions in the type domain which take types as arguments which can yield either types or values indexed on their type arguments. Type families come in two varieties: data families and type synonym families.

  • "type family" are named function on types
  • "data family" are type-indexed data types

First let's look at type synonym families, there are two equivalent syntactic ways of constructing them. Either as associated type families declared within a typeclass or as standalone declarations at the toplevel. The following are semantically equivalent:

type family Foo a
type instance Foo Int = ...
class C a where
   type Foo a

instance C Int where
   type Foo Int = ...

Using the same example we used for multiparamater + functional dependencies illustration we see that there is a direct translation between the type family approach and functional dependencies. These two approaches have the same expressive power.

{-# LANGUAGE TypeFamilies #-}

import Data.Char

type family Rep a
type instance Rep Int = Char
type instance Rep Char = Int

class Convertible a where
  convert :: a -> Rep a

instance Convertible Int where
  convert = chr

instance Convertible Char where
  convert = ord

An associated type family can be queried using the :kind! command in GHCi.

λ: :kind! Rep Int
Rep Int :: *
= Char
λ: :kind! Rep Char
Rep Char :: *
= Int

Data families on the other hand allow us to create new type parameterized data constructors. Normally we can only define typeclases functions whose behavior results in a unform result which is purely a result of the typeclasses arguments. With data families we can allow specialized behavior indexed on the type. For example if we wanted to create more complicated vector structures ( bitmasked vectors, vectors of tuples, ... ) that exposed a uniform API but internally handled the differences in their data layout we can use data families to accomplish this:

{-# LANGUAGE TypeFamilies #-}

import qualified Data.Vector.Unboxed as V

data family Array a
data instance Array Int       = IArray (V.Vector Int)
data instance Array Bool      = BArray (V.Vector Bool)
data instance Array (a,b)     = PArray (Array a) (Array b)
data instance Array (Maybe a) = MArray (V.Vector Bool) (Array a)

class IArray a where
  index :: Array a -> Int -> a

instance IArray Int where
  index (IArray xs) i = xs V.! i

instance IArray Bool where
  index (BArray xs) i = xs V.! i

-- Vector of pairs
instance (IArray a, IArray b) => IArray (a, b) where
  index (PArray xs ys) i = (index xs i, index ys i)

-- Vector of missing values
instance (IArray a) => IArray (Maybe a) where
  index (MArray bm xs) i =
    case bm V.! i of
      True  -> Nothing
      False -> Just $ index xs i

Proofs

One of most deep results in computer science, the Curry–Howard correspondence, is the relation that logical propositions can be modeled by types and instantiating those types constitutes proofs of these propositions. In dependently typed languages we can exploit this result to it's full extent, in Haskell we don't have the strength that dependent types provide but can still prove ( for a suffficently lax definition of the word "prove" ) trivial results. For example, now we can model a type level function for addition and provide a small proof that zero is an additive identity.

{-# LANGUAGE GADTs #-}
{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE ExplicitForAll #-}
{-# LANGUAGE TypeOperators #-}

data Z
data S n

data Nat n where
  Zero :: Nat Z
  Succ :: Nat n -> Nat (S n)

data Eql a b where
  Refl :: Eql a a

type family Add m n
type instance Add Z n = n
type instance Add (S m) n = S (Add m n)

type family Pred n
type instance Pred Z = Z
type instance Pred (S n) = n

add :: Nat n -> Nat m -> Nat (Add n m)
add Zero     m = m
add (Succ n) m = Succ (add n m)

cong :: Eql a b -> Eql (f a) (f b)
cong Refl = Refl

plus_zero :: forall n. Nat n -> Eql (Add n Z) n
plus_zero Zero = Refl
plus_zero (Succ n) = cong (plus_zero n)

Using the the TypeOperators extension we can also write use infix notation at the type-level.

data a :=: b where
  Refl :: a :=: a

cong :: a :=: b -> (f a) :=: (f b)
cong Refl = Refl

type family (n :: Nat) :+ (m :: Nat) :: Nat
type instance Zero     :+ m = m
type instance (Succ n) :+ m = Succ (n :+ m)

assoc :: forall m n. Nat m -> Nat n -> Eql (m :+ (S Z) :+ n) (S Z :+ m :+ n)
assoc Zero n = Refl
assoc (Succ m) n = cong (assoc m n)

HLists

A heterogeneous list is a cons list whose type statically encodes the ordered types of of it's values.

{-# LANGUAGE DataKinds #-}
{-# LANGUAGE GADTs #-}
{-# LANGUAGE TypeOperators #-}
{-# LANGUAGE KindSignatures #-}

infixr 5 :::

data HList (ts :: [ * ]) where
  Nil :: HList '[]
  (:::) :: t -> HList ts -> HList (t ': ts)

-- Take the head of a non-empty list with the first value as Bool type.
headBool :: HList (Bool ': xs) -> Bool
headBool hlist = case hlist of
  (a ::: _) -> a

hlength :: HList x -> Int
hlength Nil = 0
hlength (_ ::: b) = 1 + (hlength b)


example1 :: (Bool, (String, (Double, ())))
example1 = (True, ("foo", (3.14, ())))

example2 :: HList '[Bool, String , Double , ()]
example2 = True ::: "foo" ::: 3.14 ::: () ::: Nil

Constraint Kinds

The kind of a type family permits arbitrary kinds, but of particular interst in the Constraint kind which is enabled with the -XConstraintKinds extension. Using this we can use typeclass constraints as first class values which can naturally be indexed with the type family.

For a contrived example if we wanted to create a generic Sized class that carried with it constraints on the elements of the container in question we could achieve this quite simply using type families.

{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE ConstraintKinds #-}

import GHC.Exts (Constraint)
import Data.Hashable
import Data.HashSet

type family Con a :: Constraint
type instance Con [a] = (Ord a, Eq a)
type instance Con (HashSet a) = (Hashable a)

class Sized a where
  gsize :: Con a => a -> Int

instance Sized [a] where
  gsize = length

instance Sized (HashSet a) where
  gsize = size

One use-case of this is to capture the typeclass dictionary constrained by a function and reify that as a value.

{-# LANGUAGE GADTs #-}
{-# LANGUAGE ConstraintKinds #-}
{-# LANGUAGE KindSignatures #-}

import GHC.Exts (Constraint)

data Dict :: Constraint -> * where
  Dict :: (c) => Dict c

dShow :: Dict (Show a) -> a -> String
dShow Dict x = show x

dEqNum :: Dict (Eq a, Num a) -> a -> Bool
dEqNum Dict x = x == 0


fShow :: String
fShow = dShow Dict 10

fEqual :: Bool
fEqual = dEqNum Dict 0

The empty constraint set is denoted () :: Constraint.

DataKinds / PolyKinds

Kind Polymorphism

The regular value level function which takes a function and applies it to an argument is generalized in the usual Hindley-Milner way.

app :: forall a b. (a -> b) -> a -> b
app f a = f a

But when we do the same thing at the type-level we see we loose information about the polymorphism of the constructor applied.

-- TApp :: (* -> *) -> * -> *
data TApp f a = MkTApp (f a)

Turning on PolyKinds allows parametric polymorphism at the kind level as well.

-- Default:   (* -> *) -> * -> *
-- PolyKinds: (k -> *) -> k -> *
data TApp f a = MkTApp (f a)

-- Default:   ((* -> *) -> (* -> *)) -> (* -> *)
-- PolyKinds: ((k -> *) -> (k -> *)) -> (k -> *)
data Mu f a = Roll (f (Mu f) a)

-- Default:   * -> *
-- PolyKinds: k -> *
data Proxy a = Proxy

Using PolyKinds with the Proxy type allows us to write down type class functions which over constructors of arbitrary kind arity.

{-# LANGUAGE PolyKinds #-}
{-# LANGUAGE GADTs #-}
{-# LANGUAGE KindSignatures #-}

data Proxy a = Proxy
data Rep = Rep

class PolyClass a where
  foo :: Proxy a -> Rep
  foo = const Rep

-- () :: *
-- [] :: * -> *
-- Either :: * -> * -> *

instance PolyClass ()
instance PolyClass []
instance PolyClass Either

Data Kinds

The -XDataKinds extension allows us to use refer to constructors at the value level and the type level. Consider a simple sum type:

data S a b = L a | R b

-- S :: * -> * -> *
-- L :: a -> S a b
-- R :: b -> S a b

With the extension enabled we see that we our type constructors are now automatically promoted so that L or R can be viewed as both a data constructor of the type S or as the type L with kind S.

{-# LANGUAGE DataKinds #-}

data S a b = L a | R b

-- S :: * -> * -> *
-- L :: * -> S * *
-- R :: * -> S * *

Promoted data constructors can referred to in type signatures by prefixing them with a single quote. Also of importance is that these promoted constructors are not exported with a module by default, but type synonym instances can be created using this notation.

data Foo = Bar | Baz
type Bar = 'Bar
type Baz = 'Baz

Of interest is that we have access to GHC's type level natural literals:

λ: :kind 3
3 :: Nat
λ: :kind (3 + 4)
(3 + 4) :: Nat
λ: :kind (3 <= 4)
(3 <= 4) :: Constraint

Using this new structure we can create a Vec type which is parameterized by it's length as well as it's element type now that we have a kind language rich enough to encode the successor type in the kind signature of the generalized algebraic datatype.

{-# LANGUAGE GADTs #-}
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE KindSignatures #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE FlexibleContexts #-}

data Nat = Z | S Nat deriving (Eq, Show)

type Zero  = Z
type One   = S Zero
type Two   = S One
type Three = S Two
type Four  = S Three
type Five  = S Four

data Vec :: Nat -> * -> * where
  Nil :: Vec Z a
  Cons :: a -> Vec n a -> Vec (S n) a

instance Show (Vec Z a) where
  show Nil = "'[]"

instance (Show a, Show (Vec n a)) => Show (Vec (S n) a) where
  show (Cons x xs) = show x ++ "'::" ++ show xs

class FromList n where
  fromList :: [a] -> Vec n a

instance FromList Z where
  fromList [] = Nil

instance FromList n => FromList (S n) where
  fromList (x:xs) = Cons x $ fromList xs


lengthVec :: Vec n a -> Nat
lengthVec Nil = Z
lengthVec (Cons x xs) = S (lengthVec xs)

zipVec :: Vec n a -> Vec n b -> Vec n (a,b)
zipVec Nil Nil = Nil
zipVec (Cons x xs) (Cons y ys) = Cons (x,y) (zipVec xs ys)


vec4 :: Vec Four Int
vec4 = fromList [0, 1, 2, 3]

vec5 :: Vec Five Int
vec5 = fromList [0, 1, 2, 3, 4]


example1 :: Nat
example1 = lengthVec vec4
-- S (S (S (S Z)))

example2 :: Vec Four (Int, Int)
example2 = zipVec vec4 vec4
-- (0,0)':(1,1)':(2,2)':(3,3)':'[]

So now if we try to zip two Vec types with the wrong shape then we get a error at compile-time about the off-by-one error. We've just made a entire class of invalid unrepresentable.

example2 = zipVec vec4 vec5
-- Couldn't match type 'S 'Z with 'Z
-- Expected type: Vec Four Int
--   Actual type: Vec Five Int

The same technique we can use to create a container which is statically indexed by a empty or non-empty flag, such that if we try to take the head of a empty list we'll get a compile-time error, or stated equivalently we have an obligation to prove to the compiler that the argument we hand to the head function is non-empty.

{-# LANGUAGE GADTs #-}
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE KindSignatures #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE FlexibleContexts #-}

data Size = None | Many

data List a b where
  Nil  :: List None a
  Cons :: a -> List b a -> List Many a

head' :: List Many a -> a
head' (Cons x _) = x

example1 :: Int
example1 = head' (1 `Cons` (2 `Cons` Nil))
Couldn't match type None with Many
Expected type: List Many Int
  Actual type: List None Int

GHC's type literals can also be used in place of explicit Peano arithmetic, although GHC is very conservative about performing reduction.

{-# LANGUAGE GADTs #-}
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE KindSignatures #-}
{-# LANGUAGE TypeOperators #-}

import GHC.TypeLits

data Vec :: Nat -> * -> * where
  Nil :: Vec 0 a
  Cons :: a -> Vec n a -> Vec (1 + n) a

vec3 :: Vec (1 + (1 + (1 + 0))) Int
vec3 = 0 `Cons` (1 `Cons` (2 `Cons` Nil))

So we've just expressed the relationship between the type of a data structure based on it's values and function which can be constrained at compile-time based on these properties. Let that sink in for a moment, go take some mind-altering substances, and then go check out Agda.

See:

Generics

Haskell has several techniques for automatic generation of type classes for a variety of tasks that consist largely of boilerplate code generation such as:

  • Pretty Printing
  • Equality
  • Serialization
  • Ordering
  • Traversal

Typeable

The Typeable class be used to create runtime type information for arbitrary types.

typeOf :: Typeable a => a -> TypeRep
{-# LANGUAGE DeriveDataTypeable #-}

import Data.Typeable

data Animal = Cat | Dog deriving Typeable
data Zoo a = Zoo [a] deriving Typeable

equal :: (Typeable a, Typeable b) => a -> b -> Bool
equal a b = typeOf a == typeOf b

example1 :: TypeRep
example1 = typeOf Cat
-- Animal

example2 :: TypeRep
example2 = typeOf (Zoo [Cat, Dog])
-- Zoo Animal

example3 :: TypeRep
example3 = typeOf ((1, 6.636e-34, "foo") :: (Int, Double, String))
-- (Int,Double,[Char])

example4 :: Bool
example4 = equal False ()
-- False

Of historical note is that writing our own Typeable classes is currently possible of GHC 7.6 but allows us to introduce dangerous behavior that can cause crashes, and shouldn't be done except by GHC itself.

Dynamic

Since we have a way of querying runtime type information we can use this machinery to implement a Dynamic type. This allows us to box up any monotype into a uniform type that can be passed to any function taking a Dynamic type which can then unpack the underlying value in a type-safe way.

toDyn :: Typeable a => a -> Dynamic
fromDyn :: Typeable a => Dynamic -> a -> a
fromDynamic :: Typeable a => Dynamic -> Maybe a
cast :: (Typeable a, Typeable b) => a -> Maybe b
import Data.Dynamic
import Data.Maybe

dynamicBox :: Dynamic
dynamicBox = toDyn (6.62 :: Double)

example1 :: Maybe Int
example1 = fromDynamic dynamicBox
-- Nothing

example2 :: Maybe Double
example2 = fromDynamic dynamicBox
-- Just 6.62

example3 :: Int
example3 = fromDyn dynamicBox 0
-- 0

example4 :: Double
example4 = fromDyn dynamicBox 0.0
-- 6.62

Data

Just as Typeable let's create runtime type information where needed, the Data class allows us to reflect information about the structure of datatypes to runtime as needed.

class Typeable a => Data a where
  gfoldl  :: (forall d b. Data d => c (d -> b) -> d -> c b)
          -> (forall g. g -> c g)
          -> a
          -> c a

  gunfold :: (forall b r. Data b => c (b -> r) -> c r)
          -> (forall r. r -> c r)
          -> Constr
          -> c a

  toConstr :: a -> Constr
  dataTypeOf :: a -> DataType
  gmapQl :: (r -> r' -> r) -> r -> (forall d. Data d => d -> r') -> a -> r

The types for gfoldl and gunfold are a little intimidating ( and depend on Rank2Types ), the best way to understand is to look at some examples. First the most trivial case a simple sum type Animal would produce the follow the following code:

data Animal = Cat | Dog deriving Typeable
instance Data Animal where
  gfoldl k z Cat = z Cat
  gfoldl k z Dog = z Dog

  gunfold k z c
    = case constrIndex c of
        1 -> z Cat
        2 -> z Dog 

  toConstr Cat = cCat
  toConstr Dog = cDog

  dataTypeOf _ = tAnimal

tAnimal :: DataType
tAnimal = mkDataType "Main.Animal" [cCat, cDog]

cCat :: Constr
cCat = mkConstr tAnimal "Cat" [] Prefix

cDog :: Constr
cDog = mkConstr tAnimal "Dog" [] Prefix

For a type with non-empty containers we get something a little more interesting. Consider the list type:

instance Data a => Data [a] where
  gfoldl _ z []     = z []
  gfoldl k z (x:xs) = z (:) `k` x `k` xs

  toConstr []    = nilConstr
  toConstr (_:_) = consConstr

  gunfold k z c 
    = case constrIndex c of
        1 -> z []
        2 -> k (k (z (:)))

  dataTypeOf _ = listDataType

nilConstr :: Constr
nilConstr = mkConstr listDataType "[]" [] Prefix

consConstr :: Constr
consConstr = mkConstr listDataType "(:)" [] Infix

listDataType :: DataType
listDataType = mkDataType "Prelude.[]" [nilConstr,consConstr]

Looking at gfoldl we see the Data has an implementation of a function for us to walk an applicative over the elements of the constructor by applying a function k over each element and applying z at the spine. For example look at the instance for a 2-tuple as well:

instance (Data a, Data b) => Data (a,b) where
  gfoldl k z (a,b) = k (,) `f` a `f` b

  toConstr (_,_) = tuple2Constr

  gunfold k z c  
    = case constrIndex c of
      1 -> k (k (z (,)))

  dataTypeOf _  = tuple2DataType

tuple2Constr :: Constr
tuple2Constr = mkConstr tuple2DataType "(,)" [] Infix

tuple2DataType :: DataType
tuple2DataType = mkDataType "Prelude.(,)" [tuple2Constr]

This is pretty cool, now within the same typeclass we have a generic way to introspect any Data instance and writing logic that depends on the structure and types of its subterms. We can now write a function which allow us to traverse an arbitrary instance Data and twiddle values based on pattern matching on the runtime types. So let's write down a function over which increments a Value type for both for n-tuples and lists.

{-# LANGUAGE DeriveDataTypeable #-}

import Data.Data
import Control.Monad.Identity
import Control.Applicative

data Animal = Cat | Dog deriving (Data, Typeable)

newtype Val = Val Int deriving (Show, Data, Typeable)

incr :: Typeable a => a -> a
incr = maybe id id (cast f)
  where f (Val x) = Val (x * 100)

over :: Data a => a -> a
over x = runIdentity $ gfoldl cont base (incr x)
  where
    cont k d = k <*> (pure $ over d)
    base = pure


example1 :: Constr
example1 = toConstr Dog
-- Dog

example2 :: DataType
example2 = dataTypeOf Cat
-- DataType {tycon = "Main.Animal", datarep = AlgRep [Cat,Dog]}

example3 :: [Val]
example3 = over [Val 1, Val 2, Val 3]
-- [Val 100,Val 200,Val 300]

example4 :: (Val, Val, Val)
example4 = over (Val 1, Val 2, Val 3)
-- (Val 100,Val 200,Val 300)

We can also write generic operations to for instance count the number of parameters in a data type.

numHoles :: Data a => a -> Int
numHoles = gmapQl (+) 0 (const 1)

example1 :: Int
example1 = numHoles (1,2,3,4,5,6,7)
-- 7

example2 :: Int
example2 = numHoles (Just 3)
-- 1

This method adapts itself well to generic traversals but the types quickly become rather hairy when dealing anymore more complicated involving folds and unsafe coercions.

Generic

The most modern method of doing generic programming uses type families to achieve a better of deriving the structural properties of arbitrary type classes. Generic implements a typeclass with an associated type Rep ( Representation ) together with a pair of functions that form a 2-sided inverse ( isomorphism ) for converting to and from the associated type and the derived type in question.

class Generic a where
  type Rep a
  from :: a -> Rep a
  to :: Rep a -> a

class Datatype d where
  datatypeName :: t d f a -> String
  moduleName :: t d f a -> String

class Constructor c where
  conName :: t c f a -> String

GHC.Generics defines a set of named types for modeling the various structural properties of types in available in Haskell.

-- | Sums: encode choice between constructors
infixr 5 :+:
data (:+:) f g p = L1 (f p) | R1 (g p)

-- | Products: encode multiple arguments to constructors
infixr 6 :*:
data (:*:) f g p = f p :*: g p

-- | Tag for M1: datatype
data D
-- | Tag for M1: constructor
data C

-- | Constants, additional parameters and recursion of kind *
newtype K1 i c p = K1 { unK1 :: c }

-- | Meta-information (constructor names, etc.)
newtype M1 i c f p = M1 { unM1 :: f p }

-- | Type synonym for encoding meta-information for datatypes
type D1 = M1 D

-- | Type synonym for encoding meta-information for constructors
type C1 = M1 C

Using the deriving mechanics GHC can generate this Generic instance for us mechanically, if we were to write it by hand for a simple type it might look like this:

{-# LANGUAGE TypeOperators #-}
{-# LANGUAGE TypeFamilies #-}

import GHC.Generics

data Animal
  = Dog
  | Cat

instance Generic Animal where
  type Rep Animal = D1 T_Animal ((C1 C_Dog U1) :+: (C1 C_Cat U1))

  from Dog = M1 (L1 (M1 U1))
  from Cat = M1 (R1 (M1 U1))

  to (M1 (L1 (M1 U1))) = Dog
  to (M1 (R1 (M1 U1))) = Cat

data T_Animal
data C_Dog
data C_Cat

instance Datatype T_Animal where
  datatypeName _ = "Animal"
  moduleName _ = "Main"

instance Constructor C_Dog where
  conName _ = "Dog"

instance Constructor C_Cat where
  conName _ = "Cat"

Use kind! in GHCi we can look at the type family Rep associated with a Generic instance.

λ: :kind! Rep Animal
Rep Animal :: * -> *
= M1 D T_Animal (M1 C C_Dog U1 :+: M1 C C_Cat U1)

λ: :kind! Rep ()
Rep () :: * -> *
= M1 D GHC.Generics.D1() (M1 C GHC.Generics.C1_0() U1)

λ: :kind! Rep [()]
Rep [()] :: * -> *
= M1
    D
    GHC.Generics.D1[]
    (M1 C GHC.Generics.C1_0[] U1
     :+: M1
           C
           GHC.Generics.C1_1[]
           (M1 S NoSelector (K1 R ()) :*: M1 S NoSelector (K1 R [()])))

Now the clever bit, instead writing our generic function over the datatype we instead write it over the Rep and then reify the result using from. Some for an equivelant version of Haskell's default Eq that instead uses generic deriving we could write:

class GEq' f where
  geq' :: f a -> f a -> Bool

instance GEq' U1 where
  geq' _ _ = True

instance (GEq c) => GEq' (K1 i c) where
  geq' (K1 a) (K1 b) = geq a b

instance (GEq' a) => GEq' (M1 i c a) where
  geq' (M1 a) (M1 b) = geq' a b

-- Equality for sums.
instance (GEq' a, GEq' b) => GEq' (a :+: b) where
  geq' (L1 a) (L1 b) = geq' a b
  geq' (R1 a) (R1 b) = geq' a b
  geq' _      _      = False

-- Equality for products.
instance (GEq' a, GEq' b) => GEq' (a :*: b) where
  geq' (a1 :*: b1) (a2 :*: b2) = geq' a1 a2 && geq' b1 b2

Now to to accommodate the two methods of writing classes (generic-deriving or custom implementions) we can use DefaultSignatures extension to allow the user to leave typeclass functions blank and defer to the Generic or to define their own.

{-# LANGUAGE DefaultSignatures #-}

class GEq a where 
  geq :: a -> a -> Bool

  default geq :: (Generic a, GEq' (Rep a)) => a -> a -> Bool
  geq x y = geq' (from x) (from y)

Now anyone using our library need only derive Generic and create an empty instance of our typeclass instance without writing any boilerplate for GEq.

See:

Generic Deriving

GHC.Generics, we can use GHC to do lots of non-trivial code generation which works spectacularly well.

The hashable library allows us to derive hashing functions.

{-# LANGUAGE DeriveGeneric #-}

import GHC.Generics (Generic)
import Data.Hashable

data Color = Red | Green | Blue deriving (Generic, Show)

instance Hashable Color where

example1 :: Int
example1 = hash Red
-- 839657738087498284

example2 :: Int
example2 = hashWithSalt 0xDEADBEEF Red
-- 62679985974121021

The cereal library allows us to automatically derive a binary representation.

{-# LANGUAGE DeriveGeneric #-}

import Data.Word
import Data.ByteString
import Data.Serialize

import GHC.Generics

data Foo = A [Foo] | B [(Foo, Foo)] | C
  deriving (Generic, Show)

instance Serialize Foo where

encoded :: ByteString
encoded = encode (A [B [(C, C)]])
-- "\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\SOH\SOH\NUL\NUL\NUL\NUL\NUL\NUL\NUL\SOH\STX\STX"

bytes :: [Word8]
bytes = unpack encoded
-- [0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,1,2,2]

decoded :: Either String Foo
decoded = decode encoded

The aeson library allows us to derive JSON representations for JSON instances.

{-# LANGUAGE DeriveGeneric #-}
{-# LANGUAGE OverloadedStrings #-}

import Data.Aeson
import GHC.Generics

data Point = Point { _x :: Double, _y :: Double }
   deriving (Show, Generic)

instance FromJSON Point
instance ToJSON Point

example1 :: Maybe Point
example1 = decode "{\"x\":3.0,\"y\":-1.0}"

example2 = encode $ Point 123.4 20

The derive library allows us to derive a variety of instances, include Arbitrary for QuickCheck.

{-# LANGUAGE TemplateHaskell #-}

import Data.DeriveTH
import Test.QuickCheck

data Color = Red | Green | Blue deriving Show

$(derive makeArbitrary ''Color)

example1 :: IO [Color]
example1 = sample' arbitrary
-- [Red,Green,Blue,Red,Blue,Green,Blue,Red,Blue,Blue,Red]

See:

Sequence

The sequence data structure behaves structurally similar to list but is optimized for append/prepend operations.

import Data.Sequence

a :: Seq Int
a = fromList [1,2,3]

a0 :: Seq Int
a0 = a |> 4
-- [1,2,3,4]

a1 :: Seq Int
a1 = 0 <| a
-- [0,1,2,3]

Unordered-Containers

fromList :: (Eq k, Hashable k) => [(k, v)] -> HashMap k v
lookup :: (Eq k, Hashable k) => k -> HashMap k v -> Maybe v
insert :: (Eq k, Hashable k) => k -> v -> HashMap k v -> HashMap k v

Both the HashMap and HashSet are purely functional data structures that are drop in replacements for the containers equivalents but with more efficient space and time performance. Additionally all stored elements must have a Hashable instance.

import qualified Data.HashSet as S
import qualified Data.HashMap.Lazy as M

example1 :: M.HashMap Int Char
example1 = M.fromList $ zip [1..10] ['a'..]

example2 :: S.HashSet Int
example2 = S.fromList [1..10]

See:

Hashtables

Hashtables provides hashtables with efficient lookup within the ST or IO monad.

import Prelude hiding (lookup)

import Control.Monad.ST
import Data.HashTable.ST.Basic

-- Hashtable parameterized by ST "thread"
type HT s = HashTable s String String

example1 :: ST s (HT s)
example1 = do
  ht <- new
  insert ht "key" "value1"
  return ht

example2 :: HT s -> ST s (Maybe String)
example2 ht = do
  val <- lookup ht "key"
  return val

example3 :: Maybe String
example3 = runST (example1 >>= example2)
new :: ST s (HashTable s k v)
insert :: (Eq k, Hashable k) => HashTable s k v -> k -> v -> ST s ()
lookup :: (Eq k, Hashable k) => HashTable s k v -> k -> ST s (Maybe v)

FFI

Pure Functions

Wrapping pure C functions with primitive types is trivial.

/* $(CC) -c simple.c -o simple.o */

int example(int a, int b)
{
  return a + b;
}
-- ghc simple.o simple_ffi.hs -o simple_ffi
{-# LANGUAGE ForeignFunctionInterface #-}

import Foreign.C.Types

foreign import ccall safe "example" example
    :: CInt -> CInt -> CInt

main = print (example 42 27)

Storable Arrays

There exists a Storable typeclass that can be used to provide low-level access to the memory underlying Haskell values. The Prelude defines Storable interfaces for most of the basic types as well as types in the Foreign.C library.

class Storable a where
  sizeOf :: a -> Int
  alignment :: a -> Int
  peek :: Ptr a -> IO a
  poke :: Ptr a -> a -> IO ()

To pass arrays from Haskell to C we can again use Storable Vector and several unsafe operations to grab a foreign pointer to the underlying data that can be handed off to C. Once we're in C land, nothing will protect us from doing evil things to memory!

/* $(CC) -c qsort.c -o qsort.o */
void swap(int *a, int *b)
{
    int t = *a;
    *a = *b;
    *b = t;
}

void sort(int *xs, int beg, int end)
{
    if (end > beg + 1) {
        int piv = xs[beg], l = beg + 1, r = end;

        while (l < r) {
            if (xs[l] <= piv) {
                l++;
            } else {
                swap(&xs[l], &xs[--r]);
            }
        }

        swap(&xs[--l], &xs[beg]);
        sort(xs, beg, l);
        sort(xs, r, end);
    }
}
-- ghc qsort.o ffi.hs -o ffi
{-# LANGUAGE ForeignFunctionInterface #-}

import Foreign.Ptr
import Foreign.C.Types
import Foreign.ForeignPtr
import Foreign.ForeignPtr.Unsafe

import qualified Data.Vector.Storable as V
import qualified Data.Vector.Storable.Mutable as VM

foreign import ccall safe "sort" qsort
    :: Ptr a -> CInt -> CInt -> IO ()

vecPtr :: VM.MVector s CInt -> ForeignPtr CInt
vecPtr = fst . VM.unsafeToForeignPtr0

main :: IO ()
main = do
  let vs = V.fromList ([1,3,5,2,1,2,5,9,6] :: [CInt])
  v <- V.thaw vs
  withForeignPtr (vecPtr v) $ \ptr -> do
    qsort ptr 0 9
  out <- V.freeze v
  print out

The names of foreign functions from a C specific header file can qualified.

foreign import ccall unsafe "stdlib.h malloc"
    malloc :: CSize -> IO (Ptr a)

Prepending the function name with a & allows us to create a reference to the function itself.

foreign import ccall unsafe "stdlib.h &malloc"
    malloc :: FunPtr a

Missing Functions

split

The split package provides a variety of missing functions for splitting list and string types.

import Data.List.Split

example1 = splitOn "." "foo.bar.baz"
-- ["foo","bar","baz"]

example2 = chunksOf 10 "To be or not to be that is the question."
-- ["To be or n","ot to be t","hat is the"," question."]

monad-loops

The monad-loops package provides a variety of missing functions for control logic in monadic contexts.

whileM :: Monad m => m Bool -> m a -> m [a]
untilM :: Monad m => m a -> m Bool -> m [a]
iterateUntilM :: Monad m => (a -> Bool) -> (a -> m a) -> a -> m a
whileJust :: Monad m => m (Maybe a) -> (a -> m b) -> m [b]

Diagrams

A parser combinator library for generating vector images to SVG and a variety of other formats.

import Diagrams.Prelude
import Diagrams.Backend.SVG.CmdLine

sierpinksi :: Int -> Diagram SVG R2
sierpinksi 1 = eqTriangle 1
sierpinksi n =
      s
     ===
  (s ||| s) # centerX
  where
    s = sierpinksi (n - 1)

example :: Diagram SVG R2
example = sierpinksi 5 # fc black

main :: IO ()
main = defaultMain example
$ runhaskell diagram1.hs -w 256 -h 256 -o diagram1.svg

Parsec

For parsing in Haskell it is quite common to use a family of libraries known as Parser Combinators which let us write code to generate parsers which themselves looks very similar to the parser grammar itself!

Combinators
<|> The choice operator tries to parse the first argument before proceeding to the second. Can be chained sequentially to a generate a sequence of options.
many Consumes an arbitrary number of patterns matching the given pattern and returns them as a list.
many1 Like many but requires at least one match.
optional Optionally parses a given pattern returning it's value as a Maybe.
try Backtracking operator will let us parse ambiguous matching expressions and restart with a different pattern.

There are two styles of writing Parsec, one can choose to write with monads or with applicatives.

parseM :: Parser Expr
parseM = do
  a <- identifier
  char '+'
  b <- identifier
  return $ Add a b

The same code written with applicatives uses the (<*) to skip over unneeded tokens.

parseA :: Parser Expr
parseA = Add <$> identifier <* char '+' <*> identifier

For a full example, consider the lambda calculus:

module Parser (parseExpr) where

import Text.Parsec
import Text.Parsec.String (Parser)
import Text.Parsec.Language (haskellStyle)

import qualified Text.Parsec.Expr as Ex
import qualified Text.Parsec.Token as Tok

type Id = String

data Expr
  = Lam Id Expr
  | App Expr Expr
  | Var Id
  | Num Int
  | Op  Binop Expr Expr
  deriving (Show)

data Binop = Add | Sub | Mul deriving Show

--

lexer :: Tok.TokenParser ()
lexer = Tok.makeTokenParser style
  where ops = ["->","\\","+","*","-","="]
        style = haskellStyle {Tok.reservedOpNames = ops }

reservedOp :: String -> Parser ()
reservedOp = Tok.reservedOp lexer

identifier :: Parser String
identifier = Tok.identifier lexer

parens :: Parser a -> Parser a
parens = Tok.parens lexer

contents :: Parser a -> Parser a
contents p = do
  Tok.whiteSpace lexer
  r <- p
  eof
  return r

--

natural :: Parser Integer
natural = Tok.natural lexer

variable :: Parser Expr
variable = do
  x <- identifier
  return (Var x)

number :: Parser Expr
number = do
  n <- natural
  return (Num (fromIntegral n))

lambda :: Parser Expr
lambda = do
  reservedOp "\\"
  x <- identifier
  reservedOp "->"
  e <- expr
  return (Lam x e)

aexp :: Parser Expr
aexp =  parens expr
    <|> variable
    <|> number
    <|> lambda

term :: Parser Expr
term = Ex.buildExpressionParser table aexp
  where infixOp x f = Ex.Infix (reservedOp x >> return f)
        table = [[infixOp "*" (Op Mul) Ex.AssocLeft],
                 [infixOp "+" (Op Add) Ex.AssocLeft]]

expr :: Parser Expr
expr = do
  es <- many1 term
  return (foldl1 App es)

parseExpr :: String -> Expr
parseExpr input =
  case parse (contents expr) "<stdin>" input of
    Left err -> error (show err)
    Right ast -> ast

main :: IO ()
main = getLine >>= print . parseExpr >> main

Trying it out:

λ: main
1+2
Op Add (Num 1) (Num 2)

\i -> \x -> x
Lam "i" (Lam "x" (Var "x"))

\s -> \f -> \g -> \x -> f x (g x)
Lam "s" (Lam "f" (Lam "g" (Lam "x" (App (App (Var "f") (Var "x")) (App (Var "g") (Var "x"))))))

Custom Operators

It's easy to hack together parsers that are internally stateful, for example adding operators that can defined at parse-time and dynamically added to the expressionParser table.

module Main where

import qualified Text.Parsec.Expr as Ex
import qualified Text.Parsec.Token as Tok

import Text.Parsec.Language (haskellStyle)

import Data.List
import Data.Function

import Control.Monad.Identity (Identity)

import Text.Parsec
import qualified Text.Parsec as P

type Name = String

data Expr
  = Var Name
  | Lam Name Expr
  | App Expr Expr
  | Let Name Expr Expr
  | BinOp Name Expr Expr
  | UnOp Name Expr
  deriving (Show)

data Assoc
  = OpLeft
  | OpRight
  | OpNone
  | OpPrefix
  | OpPostfix
  deriving Show

data Decl
  = LetDecl Expr
  | OpDecl OperatorDef
  deriving (Show)


type Op x = Ex.Operator String ParseState Identity x

type Parser a = Parsec String ParseState a
data ParseState = ParseState [OperatorDef] deriving Show

data OperatorDef = OperatorDef {
    oassoc :: Assoc
  , oprec :: Integer
  , otok :: Name
  }
  deriving Show


lexer :: Tok.GenTokenParser String u Identity
lexer = Tok.makeTokenParser style
  where ops = ["->","\\","+","*","<","=","[","]","_"]
        names = ["let","in","infixl", "infixr", "infix", "postfix", "prefix"]
        style = haskellStyle { Tok.reservedOpNames = ops
                             , Tok.reservedNames = names
                             , Tok.identLetter = alphaNum <|> oneOf "#'_"
                             , Tok.commentLine = "--"
                             }

reserved = Tok.reserved lexer
reservedOp = Tok.reservedOp lexer
identifier = Tok.identifier lexer
parens = Tok.parens lexer
brackets = Tok.brackets lexer
braces = Tok.braces lexer
commaSep = Tok.commaSep lexer
semi = Tok.semi lexer
integer = Tok.integer lexer
chr = Tok.charLiteral lexer
str = Tok.stringLiteral lexer
operator = Tok.operator lexer

contents :: Parser a -> Parser a
contents p = do
  Tok.whiteSpace lexer
  r <- p
  eof
  return r

expr :: Parser Expr
expr = do
  es <- many1 term
  return (foldl1 App es)

lambda :: Parser Expr
lambda = do
  reservedOp "\\"
  args <- identifier
  reservedOp "->"
  body <- expr
  return $ Lam args body

letin :: Parser Expr
letin = do
  reserved "let"
  x <- identifier
  reservedOp "="
  e1 <- expr
  reserved "in"
  e2 <- expr
  return (Let x e1 e2)

variable :: Parser Expr
variable = do
  x <- identifier
  return (Var x)


addOperator :: OperatorDef -> Parser ()
addOperator a = P.modifyState $ \(ParseState ops) -> ParseState (a : ops)

mkTable :: ParseState -> [[Op Expr]]
mkTable (ParseState ops) =
  map (map toParser) $
    groupBy ((==) `on` oprec) $
      reverse $ sortBy (compare `on` oprec) $ ops

toParser :: OperatorDef -> Op Expr
toParser (OperatorDef ass _ tok) = case ass of
    OpLeft    -> infixOp tok (BinOp tok) (toAssoc ass)
    OpRight   -> infixOp tok (BinOp tok) (toAssoc ass)
    OpNone    -> infixOp tok (BinOp tok) (toAssoc ass)
    OpPrefix  -> prefixOp tok (UnOp tok)
    OpPostfix -> postfixOp tok (UnOp tok)
  where
    toAssoc OpLeft = Ex.AssocLeft
    toAssoc OpRight = Ex.AssocRight
    toAssoc OpNone = Ex.AssocNone
    toAssoc _ = error "no associativity"

infixOp :: String -> (a -> a -> a) -> Ex.Assoc -> Op a
infixOp x f = Ex.Infix (reservedOp x >> return f)

prefixOp :: String -> (a -> a) -> Ex.Operator String u Identity a
prefixOp name f = Ex.Prefix (reservedOp name >> return f)

postfixOp :: String -> (a -> a) -> Ex.Operator String u Identity a
postfixOp name f = Ex.Postfix (reservedOp name >> return f)

term :: Parser Expr
term = do
  tbl <- getState
  let table = mkTable tbl
  Ex.buildExpressionParser table aexp

aexp :: Parser Expr
aexp =  letin
    <|> lambda
    <|> variable
    <|> parens expr

letdecl :: Parser Decl
letdecl = do
  e <- expr
  return $ LetDecl e


opleft :: Parser Decl
opleft = do
  reserved "infixl"
  prec <- integer
  sym <- parens operator
  let op = (OperatorDef OpLeft prec sym)
  addOperator op
  return $ OpDecl op

opright :: Parser Decl
opright = do
  reserved "infixr"
  prec <- integer
  sym <- parens operator
  let op = (OperatorDef OpRight prec sym)
  addOperator op
  return $ OpDecl op

opnone :: Parser Decl
opnone = do
  reserved "infix"
  prec <- integer
  sym <- parens operator
  let op = (OperatorDef OpNone prec sym)
  addOperator op
  return $ OpDecl op

opprefix :: Parser Decl
opprefix = do
  reserved "prefix"
  prec <- integer
  sym <- parens operator
  let op = OperatorDef OpPrefix prec sym
  addOperator op
  return $ OpDecl op

oppostfix :: Parser Decl
oppostfix = do
  reserved "postfix"
  prec <- integer
  sym <- parens operator
  let op = OperatorDef OpPostfix prec sym
  addOperator op
  return $ OpDecl op

decl :: Parser Decl
decl =
    try letdecl
    <|> opleft
    <|> opright
    <|> opnone
    <|> opprefix
    <|> oppostfix

top :: Parser Decl
top = do
  x <- decl
  P.optional semi
  return x


modl :: Parser [Decl]
modl = many top

parseModule :: SourceName -> String -> Either ParseError [Decl]
parseModule filePath = P.runParser (contents modl) (ParseState []) filePath

main :: IO ()
main = do
  input <- readFile "test.in"
  let res = parseModule "<stdin>" input
  case res of
    Left err -> print err
    Right ast -> mapM_ print ast

For example input try:

infixl 3 ($);
infixr 4 (#);

infix 4 (.);

prefix 10 (-);
postfix 10 (!);

let z = y in a $ a $ (-a)!;
let z = y in a # a # a $ b; let z = y in a # a # a # b;

Attoparsec

Attoparsec is a parser combinator like Parsec but more suited for bulk parsing of large text and binary files instead of parsing language syntax to ASTs. When written properly Attoparsec parsers can be extremely efficient.

{-# LANGUAGE OverloadedStrings #-}

import Control.Monad

import Data.Attoparsec
import Data.Attoparsec.Char8 as A
import Data.ByteString.Char8

data Action
  = Success
  | KeepAlive
  | NoResource
  | Hangup
  | NewLeader
  | Election
  deriving Show

type Sender = ByteString
type Payload = ByteString

data Message = Message
  { action :: Action
  , sender :: Sender
  , payload :: Payload
  } deriving Show

pprotocol :: Parser Message
pprotocol = do
  act  <- paction
  send <- A.takeTill (== '.')
  body <- A.takeTill (A.isSpace)
  endOfLine
  return $ Message act send body

paction :: Parser Action
paction = do
  c <- anyWord8
  case c of
    1  -> return Success
    2  -> return KeepAlive
    3  -> return NoResource
    4  -> return Hangup
    5  -> return NewLeader
    6  -> return Election
    _  -> mzero

main :: IO ()
main = do
  let msgtext = "\x01\x6c\x61\x70\x74\x6f\x70\x2e\x33\x2e\x31\x34\x31\x35\x39\x32\x36\x35\x33\x35\x0A"
  let msg = parseOnly pprotocol msgtext
  print msg

Uniplate

Uniplate is a generics library for writing traversals and transformation for arbitrary data structures. It is extremely useful for writing AST transformations and rewrite systems.

plate :: from -> Type from to
(|*)  :: Type (to -> from) to -> to -> Type from to
(|-)  :: Type (item -> from) to -> item -> Type from to

descend   :: Uniplate on => (on -> on) -> on -> on
transform :: Uniplate on => (on -> on) -> on -> on
rewrite   :: Uniplate on => (on -> Maybe on) -> on -> on

The descend function will apply a function to each immediate descendent of an expression and then combines them up into the parent expression.

The transform function will perform a single pass bottom-up transformation of all terms in the expression.

The rewrite function will perform a exhaustive transformation of all terms in the expression to fixed point, using Maybe to signify termination.

import Data.Generics.Uniplate.Direct

data Expr a
  = Fls
  | Tru
  | Var a
  | Not (Expr a)
  | And (Expr a) (Expr a)
  | Or  (Expr a) (Expr a)
  deriving (Show, Eq)

instance Uniplate (Expr a) where
  uniplate (Not f)     = plate Not |* f
  uniplate (And f1 f2) = plate And |* f1 |* f2
  uniplate (Or f1 f2)  = plate Or |* f1 |* f2
  uniplate x           = plate x

simplify :: Expr a -> Expr a
simplify = transform simp
 where
   simp (Not (Not f)) = f
   simp (Not Fls) = Tru
   simp (Not Tru) = Fls
   simp x = x

reduce :: Show a => Expr a -> Expr a
reduce = rewrite cnf
  where
    -- double negation
    cnf (Not (Not p)) = Just p

    -- de Morgan
    cnf (Not (p `Or` q))  = Just $ (Not p) `And` (Not q)
    cnf (Not (p `And` q)) = Just $ (Not p) `Or` (Not q)

    -- distribute conjunctions
    cnf (p `Or` (q `And` r)) = Just $ (p `Or` q) `And` (p `Or` r)
    cnf ((p `And` q) `Or` r) = Just $ (p `Or` q) `And` (p `Or` r)
    cnf _ = Nothing


example1 :: Expr String
example1 = simplify (Not (Not (Not (Not (Var "a")))))
-- Var "a"

example2 :: [String]
example2 = [a | Var a <- universe ex]
  where
    ex = Or (And (Var "a") (Var "b")) (Not (And (Var "c") (Var "d")))
-- ["a","b","c","d"]

example3 :: Expr String
example3 = reduce $ ((a `And` b) `Or` (c `And` d)) `Or` e
  where
    a = Var "a"
    b = Var "b"
    c = Var "c"
    d = Var "d"
    e = Var "e"

Alternatively Uniplate instances can be derived automatically from instances of Data without the need to explicitly write a Uniplate instance.

import Data.Data
import Data.Typeable
import Data.Generics.Uniplate.Data

data Expr a
  = Fls
  | Tru
  | Lit a
  | Not (Expr a)
  | And (Expr a) (Expr a)
  | Or (Expr a) (Expr a)
  deriving (Data, Typeable, Show, Eq)

Criterion

Criterion is a statistically aware benchmarking tool.

whnf :: (a -> b) -> a -> Pure
nf :: NFData b => (a -> b) -> a -> Pure
bench :: Benchmarkable b => String -> b -> Benchmark
import Criterion.Main
import Criterion.Config

-- Naive recursion for fibonacci numbers.
fib :: Int -> Int
fib 0 = 0
fib 1 = 1
fib n = fib (n-1) + fib (n-2)

-- Use the De Moivre closed form for fibonacci numbers.
fib2 :: Int -> Int
fib2 x = truncate $ ( 1 / sqrt 5 ) * ( phi ^ x - psi ^ x )
  where
      phi = ( 1 + sqrt 5 ) / 2
      psi = ( 1 - sqrt 5 ) / 2

suite :: [Benchmark]
suite = [
    bgroup "naive" [
      bench "fib 10" $ whnf fib 5
    , bench "fib 20" $ whnf fib 10
    ],
    bgroup "de moivre" [
      bench "fib 10" $ whnf fib2 5
    , bench "fib 20" $ whnf fib2 10
    ]
  ]

main :: IO ()
main = defaultMain suite
$ runhaskell criterion.hs
warming up
estimating clock resolution...
mean is 2.349801 us (320001 iterations)
found 1788 outliers among 319999 samples (0.6%)
  1373 (0.4%) high severe
estimating cost of a clock call...
mean is 65.52118 ns (23 iterations)
found 1 outliers among 23 samples (4.3%)
  1 (4.3%) high severe

benchmarking naive/fib 10
mean: 9.903067 us, lb 9.885143 us, ub 9.924404 us, ci 0.950
std dev: 100.4508 ns, lb 85.04638 ns, ub 123.1707 ns, ci 0.950

benchmarking naive/fib 20
mean: 120.7269 us, lb 120.5470 us, ub 120.9459 us, ci 0.950
std dev: 1.014556 us, lb 858.6037 ns, ub 1.296920 us, ci 0.950

benchmarking de moivre/fib 10
mean: 7.699219 us, lb 7.671107 us, ub 7.802116 us, ci 0.950
std dev: 247.3021 ns, lb 61.66586 ns, ub 572.1260 ns, ci 0.950
found 4 outliers among 100 samples (4.0%)
  2 (2.0%) high mild
  2 (2.0%) high severe
variance introduced by outliers: 27.726%
variance is moderately inflated by outliers

benchmarking de moivre/fib 20
mean: 8.082639 us, lb 8.018560 us, ub 8.350159 us, ci 0.950
std dev: 595.2161 ns, lb 77.46251 ns, ub 1.408784 us, ci 0.950
found 8 outliers among 100 samples (8.0%)
  4 (4.0%) high mild
  4 (4.0%) high severe
variance introduced by outliers: 67.628%
variance is severely inflated by outliers

Optparse-Applicative

Optparse applicative is a library for parsing command line options with a inferface similar to parsec that makes also makes heavy use of monoids to combine operations.

import Data.List
import Data.Monoid
import Options.Applicative

data Opts = Opts
  { _files :: [String]
  , _quiet :: Bool
  , _fast :: Speed
  }

data Speed = Slow | Fast

options :: Parser Opts
options = Opts <$> filename <*> quiet <*> fast
  where
    filename :: Parser [String]
    filename = many $ argument str $
         metavar "filename..."
      <> help "Input files"

    fast :: Parser Speed
    fast = flag Slow Fast $
         long "cheetah"
      <> help "Perform task quickly."

    quiet :: Parser Bool
    quiet = switch $
         long "quiet"
      <> help "Whether to shut up."

greet :: Opts -> IO ()
greet (Opts files quiet fast) = do
  putStrLn "reading these files:"
  mapM_ print files

  case fast of
    Fast -> putStrLn "quickly"
    Slow -> putStrLn "slowly"

  case quiet of
    True  -> putStrLn "quietly"
    False -> putStrLn "loudly"

opts :: ParserInfo Opts
opts = info (helper <*> options) fullDesc

main :: IO ()
main = execParser opts >>= greet

See:

Haskeline

Haskeline is cross-platform readline support which plays nice with GHCi as well.

runInputT :: Settings IO -> InputT IO a -> IO a
getInputLine :: String -> InputT IO (Maybe String)
import Control.Monad.Trans
import System.Console.Haskeline

type Repl a = InputT IO a

process :: String -> IO ()
process = putStrLn

repl :: Repl ()
repl = do
  minput <- getInputLine "Repl> "
  case minput of
    Nothing -> outputStrLn "Goodbye."
    Just input -> (liftIO $ process input) >> repl

main :: IO ()
main = runInputT defaultSettings repl

Conduits / Pipes

Pipes

await :: Monad m => Proxy () a y' y m a
yield :: Monad m => a -> Proxy x' x () a m ()

runEffect :: Monad m => Effect m r -> m 
toListM :: Monad m => Producer a m () -> m [a]

type Sink i = ConduitM i Void
type Source m o = ConduitM () o m ()
type Conduit i m o = ConduitM i o m ()

Pipes is a stream processing library with a strong emphasis on the static semantics of composition. The simplest usage is to connect "pipe" functions with a (>->) composition operator, where each component can await and yield to push and pull values along the stream. Most notably pipes can be used to manage the life cycle of lazy IO resources and can safely handle failures and resource termination gracefully.

import Pipes
import Pipes.Prelude
import Control.Monad

a = forM_ [1..10] yield

b =  forever $ do
  x <- await
  yield (x*2)
  yield (x*3)
  yield (x*4)

c = forever $ do
  x <- await
  if (x `mod` 2) == 0
    then yield x
    else return ()

result = toList $ a >-> b >-> c

For example we could construct a "FizzBuzz" pipe.

{-# LANGUAGE MultiWayIf #-}

import Pipes
import qualified Pipes.Prelude as P

import Control.Monad

a :: Producer Integer IO ()
a = each [1..100]

b :: Pipe Integer String IO ()
b = do
  n <- await
  if | n `mod` 15 == 0 -> yield "FizzBuzz"
     | n `mod` 5  == 0 -> yield "Fizz"
     | n `mod` 3  == 0 -> yield "Buzz"
     | otherwise       -> return ()

main = runEffect $ a >-> b >-> P.stdoutLn

See:

Conduits

await :: Monad m => ConduitM i o m (Maybe i)
yield :: Monad m => o -> ConduitM i o m ()
($$) :: Monad m => Source m a -> Sink a m b -> m b
(=$) :: Monad m => Conduit a m b -> Sink b m c -> Sink a m c

Conduits are conceptually similar though philosophically different approach to the same problem of constant space deterministic resource handling for IO resources.

The first initial differnce is that await function now returns a Maybe which allows different handling of termination. The composition operators are also split into a connecting operator ($$) and a fusing operator (=$) for combining Sources and Sink and a Conduit and a Sink respectively.

{-# LANGUAGE MultiWayIf #-}

import Data.Conduit
import Control.Monad.Trans
import qualified Data.Conduit.List as CL

source :: Source IO Int
source = CL.sourceList [1..100]

conduit :: Conduit Int IO String
conduit = do
  val <- await
  liftIO $ print val
  case val of
    Nothing -> return ()
    Just n -> do
      if | n `mod` 15 == 0 -> yield "FizzBuzz"
         | n `mod` 5  == 0 -> yield "Fizz"
         | n `mod` 3  == 0 -> yield "Buzz"
         | otherwise       -> return ()
      conduit

sink :: Sink String IO ()
sink = CL.mapM_ putStrLn

main :: IO ()
main = source $$ conduit =$ sink

See:

Aeson

Aeson is library for efficient parsing and generating JSON.

decode :: FromJSON a => ByteString -> Maybe a
encode :: ToJSON a => a -> ByteString
eitherDecode :: FromJSON a => ByteString -> Either String a

fromJSON :: FromJSON a => Value -> Result a
toJSON :: ToJSON a => a -> Value

We'll work with this contrived example:

{
    "id": 1,
    "name": "A green door",
    "price": 12.50,
    "tags": ["home", "green"],
    "refs": {
      "a": "red",
      "b": "blue"
    }
}

Aeson uses several high performance data structures (Vector, Text, HashMap) by default instead of the naive versions so typically using Aeson will require that us import them and use OverloadedStrings when indexing into objects.

type Object = HashMap Text Value

type Array = Vector Value

-- | A JSON value represented as a Haskell value.
data Value = Object !Object
           | Array !Array
           | String !Text
           | Number !Scientific
           | Bool !Bool
           | Null

Unstructured

In dynamic scripting languages it's common to parse amorphous blobs of JSON without any a priori structure and then handle validation problems by throwing exceptions while traversing it. We can do the same using Aeson and the Maybe monad.

{-# LANGUAGE OverloadedStrings #-}

import Data.Text
import Data.Aeson
import Data.Vector
import qualified Data.HashMap.Strict as M
import qualified Data.ByteString.Lazy as BL

-- Pull a key out of an JSON object.
(^?) :: Value -> Text -> Maybe Value
(^?) (Object obj) k = M.lookup k obj
(^?) _ _ = Nothing

-- Pull the ith value out of a JSON list.
ix :: Value -> Int -> Maybe Value
ix (Array arr) i = arr !? i
ix _ _ = Nothing

readJSON str = do
  obj <- decode str
  price <- obj ^? "price"
  refs  <- obj ^? "refs"
  tags  <- obj ^? "tags"
  aref  <- refs ^? "a"
  tag1  <- tags `ix` 0
  return (price, aref, tag1)

main :: IO ()
main = do
  contents <- BL.readFile "example.json"
  print $ readJSON contents

Structured

This isn't ideal since we've just smeared all the validation logic across our traversal logic instead of separating concerns and handling validation in separate logic. We'd like to describe the structure before-hand and the invalid case separately.

{-# LANGUAGE DeriveGeneric #-}

import Data.Text
import Data.Aeson
import GHC.Generics
import qualified Data.ByteString.Lazy as BL

import Control.Applicative

data Refs = Refs
  { a :: String
  , b :: String
  } deriving (Show,Generic)

data Data = Data
  { id    :: Int
  , name  :: Text
  , price :: Int
  , tags  :: [String]
  , refs  :: Refs
  } deriving (Show,Generic)

instance FromJSON Data
instance FromJSON Refs
instance ToJSON Data
instance ToJSON Refs

main :: IO ()
main = do
  contents <- BL.readFile "example.json"
  let Just dat = decode contents
  print $ name dat
  print $ a (refs dat)

Now we get our validated JSON wrapped up into a nicely typed Haskell ADT.

Data
  { id = 1
  , name = "A green door"
  , price = 12
  , tags = [ "home" , "green" ]
  , refs = Refs { a = "red" , b = "blue" }
  }

The functions fromJSON and toJSON can be used to convert between this sum type and regular Haskell types with.

data Result a = Error String | Success a
λ: fromJSON (Bool True) :: Result Bool
Success True

λ: fromJSON (Bool True) :: Result Double
Error "when expecting a Double, encountered Boolean instead"

Cassava

Cassava is an efficient CSV parser library. We'll work with this tiny snippet from the iris dataset:

sepal_length,sepal_width,petal_length,petal_width,plant_class
5.1,3.5,1.4,0.2,Iris-setosa
5.0,2.0,3.5,1.0,Iris-versicolor
6.3,3.3,6.0,2.5,Iris-virginica

Unstructured

Just like with Aeson if we really want to work with unstructured data the library accommodates this.

import Data.Csv

import Text.Show.Pretty

import qualified Data.Vector as V
import qualified Data.ByteString.Lazy as BL

type ErrorMsg = String
type CsvData = V.Vector (V.Vector BL.ByteString)

example :: FilePath -> IO (Either ErrorMsg CsvData)
example fname = do
  contents <- BL.readFile fname
  return $ decode NoHeader contents

We see we get the nested set of stringy vectors:

[ [ "sepal_length"
  , "sepal_width"
  , "petal_length"
  , "petal_width"
  , "plant_class"
  ]
, [ "5.1" , "3.5" , "1.4" , "0.2" , "Iris-setosa" ]
, [ "5.0" , "2.0" , "3.5" , "1.0" , "Iris-versicolor" ]
, [ "6.3" , "3.3" , "6.0" , "2.5" , "Iris-virginica" ]
]

Structured

{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE DeriveGeneric #-}

import Data.Csv
import GHC.Generics
import qualified Data.Vector as V
import qualified Data.ByteString.Lazy as BL

data Plant = Plant
  { sepal_length :: Double
  , sepal_width  :: Double
  , petal_length :: Double
  , petal_width  :: Double
  , plant_class :: String
  } deriving (Generic, Show)

instance FromNamedRecord Plant
instance ToNamedRecord Plant

type ErrorMsg = String
type CsvData = (Header, V.Vector Plant)

parseCSV :: FilePath -> IO (Either ErrorMsg CsvData)
parseCSV fname = do
  contents <- BL.readFile fname
  return $ decodeByName contents

main = parseCSV "iris.csv" >>= print

And again we get a nice typed ADT as a result.

[ Plant
    { sepal_length = 5.1
    , sepal_width = 3.5
    , petal_length = 1.4
    , petal_width = 0.2
    , plant_class = "Iris-setosa"
    }
, Plant
    { sepal_length = 5.0
    , sepal_width = 2.0
    , petal_length = 3.5
    , petal_width = 1.0
    , plant_class = "Iris-versicolor"
    }
, Plant
    { sepal_length = 6.3
    , sepal_width = 3.3
    , petal_length = 6.0
    , petal_width = 2.5
    , plant_class = "Iris-virginica"
    }
]

Warp

Warp is a web server, it writes data to sockets quickly.

{-# LANGUAGE OverloadedStrings #-}

import Network.Wai.Handler.Warp (run)
import Network.Wai.Application.Static

server :: Int -> IO ()
server port = do
  let staticPath = "."
  let app = staticApp $ defaultFileServerSettings staticPath
  run port app

main :: IO ()
main = server 8000

See:

Scotty

Continuing with our trek through web libraries, Scotty is a web microframework similar in principle to Flask in Python or Sinatra in Ruby.

{-# LANGUAGE OverloadedStrings #-}

import Web.Scotty

import qualified Text.Blaze.Html5 as H
import Text.Blaze.Html5 (toHtml, Html)
import Text.Blaze.Html.Renderer.Text (renderHtml)

greet :: String -> Html
greet user = H.html $ do
  H.head $
    H.title "Welcome!"
  H.body $ do
    H.h1 "Greetings!"
    H.p ("Hello " >> toHtml user >> "!")

app = do
  get "/" $
    text "Home Page"

  get "/greet/:name" $ do
    name <- param "name"
    html $ renderHtml (greet name)

main :: IO ()
main = scotty 8000 app

Of importance to note is the Blaze library used here overloads do-notation is not itself a monad.

See:

Acid State

Acid-state allows us to build a "database on demand" for arbitrary Haskell datatypes that guarantees atomic transactions. For example, we can build a simple key-value store wrapped around the Map type.

{-# LANGUAGE DeriveDataTypeable #-}
{-# LANGUAGE TemplateHaskell #-}

import Data.Acid
import Data.Typeable
import Data.SafeCopy
import Control.Monad.Reader (ask)

import qualified Data.Map as Map
import qualified Control.Monad.State as S

type Key = String
type Value = String

data Database = Database !(Map.Map Key Value)
    deriving (Show, Ord, Eq, Typeable)

$(deriveSafeCopy 0 'base ''Database)

insertKey :: Key -> Value -> Update Database ()
insertKey key value
    = do Database m <- S.get
         S.put (Database (Map.insert key value m))

lookupKey :: Key -> Query Database (Maybe Value)
lookupKey key
    = do Database m <- ask
         return (Map.lookup key m)

deleteKey :: Key -> Update Database ()
deleteKey key
    = do Database m <- S.get
         S.put (Database (Map.delete key m))

allKeys :: Int -> Query Database [(Key, Value)]
allKeys limit
    = do Database m <- ask
         return $ take limit (Map.toList m)

$(makeAcidic ''Database ['insertKey, 'lookupKey, 'allKeys, 'deleteKey])

fixtures :: Map.Map String String
fixtures = Map.empty

test ::  Key -> Value -> IO ()
test key val = do
    database <- openLocalStateFrom "db/" (Database fixtures)
    result <- update database (InsertKey key val)
    result <- query database (AllKeys 10)
    print result

Safe Haskell

The Safe Haskell language extensions allow us to restrict the use of unsafe language features using -XSafe, or at least on our honor claim that we've proved that our code is referentially transparent using -XTrustworthy.

{-# LANGUAGE Safe #-}
{-# LANGUAGE Trustworthy #-}
{-# LANGUAGE Safe #-}

import Unsafe.Coerce
import System.IO.Unsafe

sin :: String
sin = unsafePerformIO $ getLine

mortalsin :: a
mortalsin = unsafeCoerce 3.14 ()
Unsafe.Coerce: Can't be safely imported!
The module itself isn't safe.

GHC Core

To inspect the core from GHCi we can invoke it using the following flags and the alias:

alias ghci-core="ghci -ddump-simpl -dsuppress-idinfo \
-dsuppress-coercions -dsuppress-type-applications \ 
-dsuppress-uniques -dsuppress-module-prefixes"

At the interactive prompt we can then explore the core representation interactively:

$ ghci-core
λ: let f x = x + 2 ; f :: Int -> Int

==================== Simplified expression ====================
returnIO
  (: ((\ (x :: Int) -> + $fNumInt x (I# 2)) `cast` ...) ([]))

λ: let f x = (x, x)

==================== Simplified expression ====================
returnIO (: ((\ (@ t) (x :: t) -> (x, x)) `cast` ...) ([]))

ghc-core is also very useful for looking at GHC's compilation artifacts.

$ ghc-core --no-cast --no-asm

Core from GHC is roughly human readable, but it's helpful to look at simple human written examples to get the hang of what's going on. Of important note is that the Λ and λ for type-level and value-level lambda abstraction are represented by the same symbol (\) in core, which is a simplifying detail of the GHC's implementation but a source of some confusion when starting.

id :: a -> a
id x = x
id :: forall a. a -> a
id = \ (@ a) (x :: a) -> x
compose :: (b -> c) -> (a -> b) -> a -> c
compose f g x = f (g x)
compose :: forall b c a. (b -> c) -> (a -> b) -> a -> c
compose = \ (@ b) (@ c) (@ a) (f1 :: b -> c) (g :: a -> b) (x1 :: a) -> f1 (g x1)
map :: (a -> b) -> [a] -> [b]
map f []     = []
map f (x:xs) = f x : map f xs
map :: forall a b. (a -> b) -> [a] -> [b]
map =
  \ (@ a) (@ b) (f :: a -> b) (xs :: [a]) ->
    case xs of _ {
      []     -> GHC.Types.[] @ b;
      : y ys -> GHC.Types.: @ b (f y) (map @ a @ b f ys)
    }
x `seq` y
case x of _ { 
  __DEFAULT -> y 
}

Unboxed Values

The usual integer type in Haskell can be considered to be a regular algebraic datatype with a special constructor.

data Int = I# Int#

The function for integer arithmetic used in the Num typeclass for Int is just pattern matching on this type to reveal the underlying unboxed value, performing the builtin arithmetic and then performing the packing up into Int again.


(+#) :: Int# -> Int# -> Int#
(I# x) `plusInt`  (I# y) = I# (x +# y)

Where (+#) is a low level function built into GHC that maps to unboxed integer arithmetic directly.

plusInt a b = case a of {
    (I# a_) -> case b of {
      (I# b_) -> I# (+# a_ b_);
    };
};

Since the Int type we'd write down for normal logic is itself boxed, we'd sometimes like to inform GHC that our value should is just a fixed unboxed value on the heap and to refer to it by value instead of by reference. In C the rewrite would be like the following:

struct A {
  int *a;
};

struct A {
  int a;
};

Effectively we'd like to be able to define our constructor to be stored as:

data A = A #Int

But maintain all our logic around as if it were written against Int, performing the boxing and unboxing where needed.

data A = A !Int

To do this there is the UNPACK pragma or -funbox-strict-fields to inform GHC to perform the rewrite we want.

data A = A {-# UNPACK #-} !Int

See:

LLVM General

llvm-general is so useful I've written an entire tutorial on it.

See:

van Laarhoven Lenses

There are two implementations of note that are mostly compatible but differ in scope:

  • lens - The kitchen sink library with a wide variety of instances for many common libraries.
  • lens-family-core - The core abstractions in a standalone library with minimal dependencies.

lens

At it's core a lens is a type of the form:

type Lens a b = forall f. Functor f => (b -> f b) -> (a -> f a)

Using this type and some related machinery we get a framework for building a very general set of combinators for working with datatypes of arbitrary structure and targets within their substucture. Some of the combinators are:

Combinator Description
set Replace target with a value and return updated structure.
over Update targets with a function and return updated structure.
view View a single target or fold the targets of a monoidal quantity.
to Construct a retrieval function from an arbitrary Haskell function.
traverse Map each element of a structure to an action and collect results.
ix Target the given index of a generic indexable structure.
toListOf Return a list of the targets.

Constructing the lens field types from an arbitrary datatype involves a bit of boilerplate code generation, but template Haskell can take care of this construction for us at compile time. See derivation in links for more details.

The simplest usage of lens is simply as a more compositional way of dealing with record access and updates.

{-# LANGUAGE TemplateHaskell #-}

import Control.Lens

data Rec = MkRec { _foo :: Int , _bar :: Int } deriving Show
makeLenses ''Rec

x :: Rec
x = MkRec { _foo = 1024, _bar = 1024 }

get1 :: Int
get1 = (_foo x) + (_bar x)

get2 :: Int
get2 = (x ^. foo) + (x ^. bar)

get3 :: Int
get3 = (view foo x) + (view bar x)


set1 :: Rec
set1 = x { _foo = 1, _bar = 2 }

set2 :: Rec
set2 = x & (foo .~ 1) . (bar .~ 2)

set3 :: Rec
set3 = x & (set foo 1) . (set bar 2)

Of course this is just the surface, the real strength comes when dealing with complex and deeply nested structures:

{-# LANGUAGE TemplateHaskell #-}
{-# LANGUAGE NoMonomorphismRestriction #-}

import Control.Lens
import Control.Lens.TH

data Record1 a = Record1
  { _a :: Int
  , _b :: Maybe a
  } deriving Show

data Record2 = Record2
  { _c :: String
  , _d :: [Int]
  } deriving Show

$(makeLenses ''Record1)
$(makeLenses ''Record2)

records = [
    Record1 {
      _a = 1,
      _b = Nothing
    },
    Record1 {
      _a = 2,
      _b = Just $ Record2 {
        _c = "Picard",
        _d = [1,2,3]
      }
    },
    Record1 {
      _a = 3,
      _b = Just $ Record2 {
        _c = "Riker",
        _d = [4,5,6]
      }
    },
    Record1 {
      _a = 4,
      _b = Just $ Record2 {
        _c = "Data",
        _d = [7,8,9]
      }
    }
  ]

-- Some abstract traversals.
ids   = traverse.a
names = traverse.b._Just.c
nums  = traverse.b._Just.d
list2 = traverse.b._Just.d.ix 2

-- Modify/read/extract in terms of generic traversals.

-- Modify to set all 'id' fields to 0
ex1 = set ids 0 records

-- Return a view of the concatenated 'd' fields for all nested records.
ex2 = view nums records
-- [1,2,3,4,5,6,7,8,9]

-- Increment all 'id' fields by 1
ex3 = over ids (+1) records

-- Return a list of all 'c' fields.
ex4 = toListOf names records
-- ["Picard","Riker","Data"]

-- Return the the second element of all 'd' fields.
ex5 = toListOf list2 records
-- [3,6,9]

Lens also provides us with an optional dense slurry of operators that expand into combinations of the core combinators. Many of the operators do have a consistent naming scheme.

Of course lenses generalize to arbitrary data structures and computations, not just nested records:

{-# LANGUAGE NoMonomorphismRestriction #-}

import Control.Lens
import Numeric.Lens
import Data.Complex.Lens

import Data.Complex
import qualified Data.Map as Map

l :: Num t => t
l = view _1 (100, 200)
-- [100,200,300]

m :: (Num t) => (t, t, t)
m = (100,200,200) & _3 %~ (+100)
-- [100,200,300]

n :: Num a => [a]
n = [100,200,300] & traverse %~ (+1)
-- [101,201,301]

o :: Char
o = "frodo" ^?! ix 3
-- 'd'

p :: Num a => [a]
p = [[1,2,3], [4,5,6]] ^. traverse
-- [1,2,3,4,5,6]

q :: Num a => [a]
q = [1,2,3,4,5] ^. _tail
-- [2,3,4,5]

r :: Maybe String
r = Map.fromList [("foo", "bar")] ^.at "foo"
-- "bar"

s :: Integral a => Maybe a
s = "1010110" ^? binary
-- Just 86

t :: RealFloat a => Complex a
t = (mkPolar 1 pi/2) & _phase +~ pi

u :: IO [String]
u = ["first","second","third"] ^!! folded.act ((>> getLine) . putStrLn)
-- first
-- a
-- second
-- b
-- third
-- c
-- ["a","b","c"]

See:

State and Zooms

Within the context of the state monad there are a particularly useful set of lens patterns.

  • use - View a target from the state of the State monad.
  • assign - Replace the target within a State monad.
  • zoom - Modify a target of the state with a function and perform it on the global state of the State monad.

So for example if we wanted to write a little physics simulation of the random motion of particles in a box. We can use the zoom function to modify the state of our particles in each step of the simulation.

{-# LANGUAGE TemplateHaskell #-}

import Control.Lens
import Control.Monad.State
import System.Random

data Vector = Vector
    { _x :: Double
    , _y :: Double
    } deriving (Show)

data Box = Box
    { _particles :: [Particle]
    } deriving (Show)

data Particle = Particle
    { _pos :: Vector
    , _vel :: Vector
    } deriving (Show)

$(makeLenses ''Box)
$(makeLenses ''Particle)
$(makeLenses ''Vector)

step :: StateT Box IO ()
step = zoom (particles.traverse) $ do
    dx <- use (vel.x)
    dy <- use (vel.y)
    pos.x += dx
    pos.y += dy

particle :: IO Particle
particle = do
  vx <- randomIO
  vy <- randomIO
  return $ Particle (Vector 0 0) (Vector vx vy)

simulate :: IO Box
simulate = do
  particles <- replicateM 5 particle
  let simulation = replicateM 5 step
  let box = Box particles
  execStateT simulation box

main :: IO ()
main = do
  final_state <- simulate
  print final_state

This results in a final state like the following.

Box
  { _particles =
      [ Particle
          { _pos = Vector { _x = 3.268546939011934 , _y = 4.356638656040016 }
          , _vel =
              Vector { _x = 0.6537093878023869 , _y = 0.8713277312080032 }
          }
      , Particle
          { _pos =
              Vector { _x = 0.5492296641559635 , _y = 0.27244422070641594 }
          , _vel =
              Vector { _x = 0.1098459328311927 , _y = 5.448884414128319e-2 }
          }
      , Particle
          { _pos =
              Vector { _x = 3.961168796078436 , _y = 4.9317543172941765 }
          , _vel =
              Vector { _x = 0.7922337592156872 , _y = 0.9863508634588353 }
          }
      , Particle
          { _pos =
              Vector { _x = 4.821390854065674 , _y = 1.6601909953629823 }
          , _vel =
              Vector { _x = 0.9642781708131349 , _y = 0.33203819907259646 }
          }
      , Particle
          { _pos =
              Vector { _x = 2.6468253761062943 , _y = 2.161403445396069 }
          , _vel =
              Vector { _x = 0.5293650752212589 , _y = 0.4322806890792138 }
          }
      ]
  }

Lens + Aeson

One of the best showcases for lens is writing transformations over arbitrary JSON structures. For example consider some sample data related to Kiva loans.

{
   "loans":[
      {
         "id":2930,
         "terms":{
            "local_payments":[
               {
                  "due_date":"2007-02-08T08:00:00Z",
                  "amount":13.75
               },
               {
                  "due_date":"2007-03-08T08:00:00Z",
                  "amount":93.75
               },
               {
                  "due_date":"2007-04-08T07:00:00Z",
                  "amount":43.75
               },
               {
                  "due_date":"2007-05-08T07:00:00Z",
                  "amount":63.75
               },
               {
                  "due_date":"2007-06-08T07:00:00Z",
                  "amount":93.75
               },
               {
                  "due_date":"2007-07-08T05:00:00Z",
                  "amount": null
               },
               {
                  "due_date":"2007-07-08T07:00:00Z",
                  "amount":93.75
               },
               {
                  "due_date":"2007-08-08T07:00:00Z",
                  "amount":93.75
               },
               {
                  "due_date":"2007-09-08T07:00:00Z",
                  "amount":93.75
               }
            ]
          }
      }
   ]
}

Then using Data.Aeson.Lens we can traverse the structure using our lens combinators.

{-# LANGUAGE OverloadedStrings #-}

import Control.Lens

import Data.Aeson.Lens
import Data.Aeson (decode, Value)
import Data.ByteString.Lazy as BL

main :: IO ()
main = do
  contents <- BL.readFile "kiva.json"
  let Just json = decode contents :: Maybe Value

  let vals :: [Double]
      vals = json ^.. key "loans"
                    . values
                    . key "terms"
                    . key "local_payments"
                    . values
                    . key "amount"
                    . _Double
  print vals
[13.75,93.75,43.75,63.75,93.75,93.75,93.75,93.75]

lens-family

The interface for lens-family is very similar to lens but with a smaller API and core.

{-# LANGUAGE TemplateHaskell #-}
{-# LANGUAGE NoMonomorphismRestriction #-}

import Lens.Family
import Lens.Family.TH
import Lens.Family.Stock
import Data.Traversable

data Record1 a = Record1
  { _a :: Int
  , _b :: Maybe a
  } deriving Show

data Record2 = Record2
  { _c :: String
  , _d :: [Int]
  } deriving Show

$(mkLenses ''Record1)
$(mkLenses ''Record2)

records = [
    Record1 {
      _a = 1,
      _b = Nothing
    },
    Record1 {
      _a = 2,
      _b = Just $ Record2 {
        _c = "Picard",
        _d = [1,2,3]
      }
    },
    Record1 {
      _a = 3,
      _b = Just $ Record2 {
        _c = "Riker",
        _d = [4,5,6]
      }
    },
    Record1 {
      _a = 4,
      _b = Just $ Record2 {
        _c = "Data",
        _d = [7,8,9]
      }
    }
  ]

ids   = traverse.a
names = traverse.b._Just.c
nums  = traverse.b._Just.d

ex1 = set ids 0 records
ex2 = view nums records
ex3 = over ids (+1) records
ex4 = toListOf names records

Categories

Categories

The most basic structure is a category which is an algebraic structure of objects (Obj) and morphisms (Hom) with the structure that morphisms compose associatively and the existence of a identity morphism for each object.

class Category c where
  id  :: c x x
  (.) :: c y z -> c x y -> c x z

Isomorphisms

Two objects of a category are said to be isomorphic if there exists a morphism with 2-sided inverse.

f  :: a -> b
f' :: b -> a

Such that:

f . f' = id
f'. f  = id

Duality

One of the central ideas is the notion of duality, that reversing some internal structure yields a new structure with a "mirror" set of theorems. The dual of a category reverse the direction of the morphisms forming the category COp.

newtype Op a b = Op { unOp :: b -> a}

instance Category Op where
  id = Op Prelude.id
  Op f . Op g = Op (g Prelude.. f)

See:

Functors

Functors are mappings between the objects and morphisms of categories that preserve identities and composition.

newtype FComp f g a = C { unC :: f (g a) }

instance (Functor f, Functor g) => Functor (FComp f g) where
  fmap f (C x) = C (fmap (fmap f) x)
fmap id = id
fmap (a . b) = (fmap a) . (fmap b)

Natural Transformations

Natural transformations are mappings between functors that preserves morphism composition.

type Nat f g = forall a. f a -> g a

Such that for a natural transformation h.

fmap f . h ≡ f h . fmap f 

The simplest example is between (f = List) and (g = Maybe) types.

headMay :: forall a. [a] -> Maybe a
headMay []     = Nothing
headMay (x:xs) = Just x

Regardless of how we chase safeHead, we end up with the same result.

fmap f (headMay xs) ≡ headMay (fmap f xs)
fmap f (headMay [])
= fmap f Nothing
= Nothing

headMay (fmap f [])
= headMay []
= Nothing
fmap f (headMay (x:xs))
= fmap f (Just x)
= Just (f x)

headMay (fmap f (x:xs))
= headMay [f x]
= Just (f x)

Yoneda Lemma

The Yoneda lemma is an elementary, but deep result in Category theory. The Yoneda lemma states that for any functor F, the types F a and ∀ b. (a -> b) -> F b are isomorphic.

{-# LANGUAGE RankNTypes #-}

embed :: Functor f => f a -> (forall b . (a -> b) -> f b)
embed x f = fmap f x

unembed :: Functor f => (forall b . (a -> b) -> f b) -> f a
unembed f = f id

So that we have:

embed . unembed ≡ id
unembed . embed ≡ id

The most broad hand-wavy statement of the theorem is that an object in a category can be represented by the set of morphisms into it, and that the information about these morphisms alone sufficiently determines all properties of the object itself.

In terms of Haskell types, given a fixed type a and a functor f, if we have some a higher order polymorphic function g that when given a function of type a -> b yields f b then the behavior g is entirely determined by a -> b and the behavior of g can written purely in terms of f a.

See:

Kleisli Category

Kleisli composition (i.e. Kleisli Fish) is defined to be:

(>=>) :: Monad m => (a -> m b) -> (b -> m c) -> a -> m c
f >=> g ≡ \x -> f x >>= g 
newtype K m a b = K { unKleisli :: a -> m b }
 
instance (Monad m) => Category (K m) where
  id = K return 
  (K g) . (K f) = K (f >=> g)

The monad laws state in terms of the Kleisli category of a monad m are:

(f >=> g) >=> h ≡ f >=> (g >=> h)
return >=> f ≡ f
f >=> return ≡  f

Stated simply that the monad laws above are just the category laws in the Kleisli category.

For example, Just is just an identity morphism in the Kleisli category of the Maybe monad.

Just >=> a ≡ a
a >=> Just ≡ a

Mathematics

Just as in Haskell we try to unify the common ideas from distinct structures, we can ask a simple question like what the fundamental notion of a group is for different mathematical categories:

Category Description Group
Set The category of sets with objects as Abelian group
sets and morphisms are functions
between them.
Man The category of manifolds with Lie group
objects as manifolds and morphisms
as differentiable functions between
manifolds.
Top The category of topological spaces Topological group
with objects as topological spaces
as and continuous functions between
spaces.
Grp The category of Abelian groups, Category objects
with groups as objects and group
homomorphism between groups.

Some deep results in algebraic topology about the homology groups of topological spaces turn out stated very concisely as the relationships between functors and natural isomorphisms of these four categories!

Resources