What I Wish I Knew When Learning Haskell 2.1

Stephen Diehl (@smdiehl )

Since I wrote these slides for a little user group talk I gave two years ago they have become a surprisingly popular reference. I decided to actually turn them into a proper skimmable reference for intermediate and advanced level Haskell topics that don't necessarily have great coverage or that tend be somewhat opaque as to where to get going, and then aggregate a bunch of the best external resources for diving into those subjects with more depth. Hopefully it still captures the "no bullshit brain dump" style that seemed to be liked.

The source for all code is available here. If there are any errors or you think of a more illustrative example feel free to submit a pull request.

This is the second draft of this document.

License

This code and text are dedicated to the public domain. You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission.

Basics

Cabal

Cabal is the build system for Haskell, it also doubles as a package manager.

For example to install the parsec package from Hackage to our system invoke the install command:

$ cabal install parsec           # latest version
$ cabal install parsec==3.1.5    # exact version

The usual build invocation for Haskell packages is the following:

$ cabal get parsec    # fetch source
$ cd parsec-3.1.5

$ cabal configure
$ cabal build
$ cabal install

To update the package index from Hackage run:

$ cabal update

To start a new Haskell project run

$ cabal init
$ cabal configure

A .cabal file will be created with the configuration options for our new project.

The latest feature of Cabal is the addition of Sandboxes ( in cabal > 1.18 ) which are self contained environments of Haskell packages separate from the global package index stored in the ./.cabal-sandbox of our project's root. To create a new sandbox for our cabal project run.

$ cabal sandbox init

In addition the sandbox can be torn down.

$ cabal sandbox delete

Invoking the cabal commands when in the working directory of a project with a sandbox configuration set up alters the behavior of cabal itself. For example the cabal install command will only alter the install to the local package index and will not touch the global configuration.

To install the dependencies from the cabal file into the newly created sandbox run:

$ cabal install --only-dependencies

Dependencies can also be built in parallel by passing -j<n> where n is the number of concurrent builds.

$ cabal install -j4 --only-dependencies

Let's look at an example cabal file, there are two main entry points that any package may provide: a library and an executable. Multiple executables can be defined, but only one library. In addition there is a special form of executable entry point Test-Suite which defines an interface for unit tests to be invoked from cabal.

For a library the exposed-modules field in the cabal file indicates which modules within the package structure will be publicly visible when the package is installed, these are the user-facing APIs that we wish to exposes to downstream consumers.

For an executable the main-is field indicates the Main module for the project that exports the main function to run for the executable logic of the application.

name:               mylibrary
version:            0.1
cabal-version:      >= 1.10
author:             Paul Atreides 
license:            MIT
license-file:       LICENSE
synopsis:           The code must flow.
category:           Math
tested-with:        GHC
build-type:         Simple
  
library                    
    exposed-modules:
      Library.ExampleModule1
      Library.ExampleModule2

    build-depends: 
      base >= 4 && < 5

    default-language: Haskell2010

    ghc-options: -O2 -Wall -fwarn-tabs

executable "example"
    build-depends: 
        base >= 4 && < 5,
        mylibrary == 0.1
    default-language: Haskell2010
    main-is: Main.hs    

Test-Suite test
  type: exitcode-stdio-1.0
  main-is: Test.hs
  default-language: Haskell2010
  build-depends:
      base >= 4 && < 5,
      mylibrary == 0.1

To run the "executable" for a library under the cabal sandbox:

$ cabal run
$ cabal run <name>

To load the "library" into a GHCi shell under the cabal sandbox:

$ cabal repl
$ cabal repl <name>

The <name> metavariable is either one of the executable or library declarations in the cabal file, and can optionally be disambiguated by the prefix exe:<name> or lib:<name> respectively.

To build the package locally into the ./dist/build folder execute the build command.

$ cabal build 

To run the tests, our package must itself be reconfigured with the --enable-tests and the build-depends from the Test-Suite must be manually installed if not already.

$ cabal configure --enable-tests
$ cabal install --only-dependencies --enable-tests
$ cabal test
$ cabal test <name>

In addition arbitrary shell commands can also be invoked with the GHC environmental variables set up for the sandbox. Quite common is to invoke a new shell with this command such that the ghc and ghci commands use the sandbox ( they don't by default, which is a common source of frustration ).

$ cabal exec 
$ cabal exec sh # launch a shell with GHC sandbox path set.

The haddock documentation can be built for the local project by executing the haddock command, it will be built to the ./dist folder.

$ cabal haddock

When we're finally ready to upload to Hackage ( presuming we have a Hackage account set up ), then we can build the tarball and upload with the following commands:

$ cabal sdist
$ cabal upload dist/mylibrary-0.1.tar.gz

Using the cabal repl and cabal run commands are preferable but sometimes we'd like to manually perform their equivalents at the shell, there are several useful aliases that rely on shell directory expansion to find the package database in the current working directory and launch GHC with the appropriate flags:

alias ghc-sandbox="ghc -no-user-package-db -package-db .cabal-sandbox/*-packages.conf.d"
alias ghci-sandbox="ghci -no-user-package-db -package-db .cabal-sandbox/*-packages.conf.d"
alias runhaskell-sandbox="runhaskell -no-user-package-db -package-db .cabal-sandbox/*-packages.conf.d"

Courtesy of Brian McKenna there is also a zsh script to show the sandbox status of the current working directory in our shell.

function cabal_sandbox_info() {
    cabal_files=(*.cabal(N))
    if [ $#cabal_files -gt 0 ]; then
        if [ -f cabal.sandbox.config ]; then
            echo "%{$fg[green]%}sandboxed%{$reset_color%}"
        else
            echo "%{$fg[red]%}not sandboxed%{$reset_color%}"
        fi
    fi
}
 
RPROMPT="\$(cabal_sandbox_info) $RPROMPT"

The cabal configuration is stored in $HOME/.cabal/config and contains various options including credential information for Hackage upload. One addition to configuration is to completely disallow the installation of packages outside of sandboxes to prevent accidental collisions.

-- Don't allow global install of packages.
require-sandbox: True

Another common flag to enable is the documentation which forces the local build of Haddock documentation, which can be useful for offline reference. On a Linux filesystem these are built to the /usr/share/doc/ghc/html/libraries/ directory.

documentation: True

If GHC is currently installed the documentation for the Prelude and Base libraries should be available at this local link:

/usr/share/doc/ghc/html/libraries/index.html

See:

GHCi

GHCi is the interactive shell for the GHC compiler. GHCi is where we will spend most of our time.

Command Shortcut Action
:reload :r Code reload
:type :t Type inspection
:kind :k Kind inspection
:info :i Information
:print :p Print the expression
:edit :e Load file in system editor.

The introspection commands are an essential part of debugging and interacting with Haskell code:

λ: :type 3
3 :: Num a => a
λ: :kind Either
Either :: * -> * -> *
λ: :info Functor
class Functor f where
  fmap :: (a -> b) -> f a -> f b
  (<$) :: a -> f b -> f a
        -- Defined in `GHC.Base'
  ...
λ: :i (:)
data [] a = ... | a : [a]       -- Defined in `GHC.Types'
infixr 5 :

The current state of the global environment in the shell can also be queried. Such as module-level bindings and types:

λ: :browse
λ: :show bindings

Or module level imports:

λ: :show imports
import Prelude -- implicit
import Data.Eq
import Control.Monad

Or compiler-level flags and pragmas:

λ: :set
options currently set: none.
base language is: Haskell2010
with the following modifiers:
  -XNoDatatypeContexts
  -XNondecreasingIndentation
GHCi-specific dynamic flag settings:
other dynamic, non-language, flag settings:
  -fimplicit-import-qualified
warning settings:

λ: :showi language
base language is: Haskell2010
with the following modifiers:
  -XNoDatatypeContexts
  -XNondecreasingIndentation
  -XExtendedDefaultRules

Language extensions and compiler pragmas can be set at the prompt. See the Flag Reference for the vast set of compiler flag options. For example several common ones are:

:set -XNoMonomorphismRestriction
:set -fno-warn-unused-do-bind

Several commands for interactive options have shortcuts:

Function
+t Show types of evaluated expressions
+s Show timing and memory usage
+m Enable multi-line expression delimited by :{ and :}.
λ: set +t
λ: []
[]
it :: [a]
λ: set +s
λ: foldr (+) 0 [1..25]
325
it :: Prelude.Integer
(0.02 secs, 4900952 bytes)
λ: :{ 
λ:| let foo = do
λ:|           putStrLn "hello ghci"
λ:| :}
λ: foo
"hello ghci"

The configuration for the GHCi shell can be customized globally by defining a ghci.conf in $HOME/.ghc/ or in the in current working directory as ./.ghci.conf.

For example we can add a command to use the Hoogle type search from within GHCi.

cabal install hoogle

We can use it by adding a command to our ghci.conf.

:set prompt "λ: "

:def hlint const . return $ ":! hlint \"src\""
:def hoogle \s -> return $ ":! hoogle --count=15 \"" ++ s ++ "\""
λ: :hoogle (a -> b) -> f a -> f b
Data.Traversable fmapDefault :: Traversable t => (a -> b) -> t a -> t b
Prelude fmap :: Functor f => (a -> b) -> f a -> f b

For reasons of sexiness it is desirable to set your GHC prompt to a λ or a ΠΣ if you're into that lifestyle.

:set prompt "λ: "
:set prompt "ΠΣ: "

For editor integration with vim and emacs:

cabal install hdevtools
cabal install ghc-mod
cabal install hlint

Bottoms

error :: String -> a
undefined :: a

The bottom is a singular value that inhabits every type. When evaluated the semantics of Haskell no longer yield a meaningful value. It's usually written as the symbol ⊥ (i.e. the compiler flipping you off ).

An example of a infinite looping term:

f :: a
f = let x = x in x

The undefined function is nevertheless extremely practical to accommodate writing incomplete programs and for debugging.

f :: a -> Complicated Type
f = undefined -- write tomorrow, typecheck today!

Partial functions from non-exhaustive pattern matching is probably the most common introduction of bottoms.

data F = A | B
case x of 
  A -> ()

The above is translated into the following GHC Core with the exception inserted for the non-exhaustive patterns. GHC can be made more vocal about incomplete patterns using the -fwarn-incomplete-patterns and -fwarn-incomplete-uni-patterns flags.

case x of _ {
  A -> ();
  B -> patError "<interactive>:3:11-31|case"
}

The same holds with record construction with missing fields, although there's almost never a good reason to construct a record with missing fields and GHC will warn us by default.

data Foo = Foo { example1 :: Int }
f = Foo {}

Again this has an error term put in place by the compiler:

Foo (recConError "<interactive>:4:9-12|a")

What's not immediately apparent is that they are used extensively throughout the Prelude, some for practical reasons others for historical reasons. The canonical example is the head function which as written [a] -> a could not be well-typed without the bottom.

import GHC.Err
import Prelude hiding (head, (!!), undefined)

-- degenerate functions

undefined :: a
undefined =  error "Prelude.undefined"

head :: [a] -> a
head (x:_) =  x
head []    =  error "Prelude.head: empty list"

(!!) :: [a] -> Int -> a
xs     !! n | n < 0 =  error "Prelude.!!: negative index"
[]     !! _         =  error "Prelude.!!: index too large"
(x:_)  !! 0         =  x
(_:xs) !! n         =  xs !! (n-1)

It's rare to see these partial functions thrown around carelessly in production code and the preferred method is instead to use the safe variants provided in Data.Maybe combined with the usual fold functions maybe and either or to use pattern matching.

listToMaybe :: [a] -> Maybe a
listToMaybe []     =  Nothing
listToMaybe (a:_)  =  Just a

When a bottom define in terms of error is invoked it typically will not generate any position information, but the function used to provide assertions assert can be short circuited to generate position information in the place of either undefined or error call.

import GHC.Base

foo :: a
foo = undefined
-- *** Exception: Prelude.undefined

bar :: a
bar = assert False undefined
-- *** Exception: src/fail.hs:8:7-12: Assertion failed

See: Avoiding Partial Functions

Debugger

Although it's use is somewhat rare, GHCi has a builtin debugger. Debugging uncaught exceptions from bottoms or asynchronous exceptions is in similar style to debugging segfaults with gdb.

λ: :set -fbreak-on-exception
λ: :trace main
λ: :hist
λ: :back

Trace

Haskell being pure has the unique property that most code is introspectable on it's own, as such the "printf" style of debugging is often unnecessary when we can simply open GHCi and test the function. Nevertheless Haskell does come with a unsafe trace function which can be used to perform arbitrary print statements outside of the IO monad.

import Debug.Trace

example1 :: Int
example1 = trace "impure print" 1

example2 :: Int
example2 = traceShow "tracing" 2

example3 :: [Int]
example3 = [trace "will not be called" 3]

main :: IO ()
main = do
  print example1
  print example2
  print $ length example3
-- impure print
-- 1
-- "tracing"
-- 2
-- 1

The function itself is impure ( it uses unsafePerformIO under the hood ) and shouldn't be used in stable code.

Type Holes

Since GHC 7.8 we have a new tool for debugging incomplete programs by means of type holes. By placing a underscore on any value on the right hand-side of a declaration GHC will throw an error during type-checker that reflects the possible values that could placed at this point in the program to make to make the program type-check.

instance Functor [] where
  fmap f (x:xs) = f x : fmap f _
[1 of 1] Compiling Main             ( src/typehole.hs, interpreted )

src/typehole.hs:7:32:
    Found hole ‘_’ with type: [a]
    Where: ‘a’ is a rigid type variable bound by
               the type signature for fmap :: (a -> b) -> [a] -> [b]
               at src/typehole.hs:7:3
    Relevant bindings include
      xs :: [a] (bound at src/typehole.hs:7:13)
      x :: a (bound at src/typehole.hs:7:11)
      f :: a -> b (bound at src/typehole.hs:7:8)
      fmap :: (a -> b) -> [a] -> [b] (bound at src/typehole.hs:7:3)
    In the second argument of ‘fmap’, namely ‘_’
    In the second argument of ‘(:)’, namely ‘fmap f _’
    In the expression: f x : fmap f _
Failed, modules loaded: none.

GHC has rightly suggested that the expression needed to finish the program is xs : [a].

Monads

Eightfold Path to Monad Satori

Much ink has been spilled waxing lyrical about the supposed mystique of monads. Instead I suggest a path to enlightenment:

  1. Don't read the monad tutorials.
  2. No really, don't read the monad tutorials.
  3. Learn about Haskell types.
  4. Learn what a typeclass is.
  5. Read the Typeclassopedia.
  6. Read the monad definitions.
  7. Use monads in real code.
  8. Don't write monad-analogy tutorials.

In other words, the only path to understanding monads is to read the fine source, fire up GHC and write some code. Analogies and metaphors will not lead to understanding.

See: Monad Tutorial Fallacy

Monadic Myths

The following are all false:

  • Monads are impure.
  • Monads are about effects.
  • Monads are about state.
  • Monads are about sequencing.
  • Monads are about IO.
  • Monads are dependent on laziness.
  • Monads are a "back-door" in the language to perform side-effects.
  • Monads are an embedded imperative language inside Haskell.
  • Monads require knowing abstract mathematics.

See: What a Monad Is Not

Laws

Monads are not complicated, the implementation is a typeclass with two functions, (>>=) pronounced "bind" and return. Any preconceptions one might have for the word "return" should be discarded, it has an entirely different meaning.

class Monad m where
  (>>=)  :: m a -> (a -> m b) -> m b
  return :: a -> m a

Together with three laws that all monad instances must satisfy.

Law 1

return a >>= f ≡ f a

Law 2

m >>= return ≡ m

Law 3

(m >>= f) >>= g ≡ m >>= (\x -> f x >>= g)

There is an auxiliary function ((>>)) defined in terms of the bind operation that discards its argument.

(>>) :: Monad m => m a -> m b -> m b
m >> k = m >>= \_ -> k

See: Monad Laws

Do Notation

Monads syntax in Haskell is written in sugared form that is entirely equivalent to just applications of the monad operations. The desugaring is defined recursively by the rules:

do { a <- f ; m } ≡ f >>= \a -> do { m }
do { f ; m } ≡ f >> do { m }
do { m } ≡ m

So for example:

do {
  a <- f ;
  b <- g ;
  c <- h ;
  return (a, b, c)
}

f >>= \a ->
  g >>= \b ->
    h >>= \c ->
      return (a, b, c)

In the do-notation the monad laws from above are equivalently written:

Law 1

  do x <- m
     return x

= do m

Law 2

  do y <- return x
     f y

= do f x

Law 3

  do b <- do a <- m
             f a
     g b

= do a <- m
     b <- f a
     g b

= do a <- m
     do b <- f a
        g b

See: Haskell 2010: Do Expressions

Maybe

The Maybe monad is the simplest first example of a monad instance. The Maybe monad models computations which fail to yield a value at any point during computation.

data Maybe a = Just a | Nothing
instance Monad Maybe where
  (Just x) >>= k = k x
  Nothing  >>= k = Nothing

  return = Just
(Just 3) >>= (\x -> return (x + 1))
-- Just 4

Nothing >>= (\x -> return (x + 1))
-- Nothing

return 4 :: Maybe Int
-- Just 4
example1 :: Maybe Int
example1 = do
  a <- Just 3
  b <- Just 4
  return $ a + b
-- Just 7

example2 :: Maybe Int
example2 = do
  a <- Just 3
  b <- Nothing
  return $ a + b
-- Nothing

List

The List monad is the second simplest example of a monad instance.

instance Monad [] where
  m >>= f   =  concat (map f m)
  return x  =  [x]

So for example with:

m = [1,2,3,4]
f = \x -> [1,0]

The reduction is straightforward:

m >>= f
==> [1,2,3,4] >>= \x -> [1,0]
==> concat (map (\x -> [1,0]) [1,2,3,4])
==> concat ([[1,0],[1,0],[1,0],[1,0]])
==> [1,0,1,0,1,0,1,0]

The list comprehension syntax in Haskell can be implemented in terms of the list monad.

[(x,y) | x <- xs, y <- ys]
example :: [(Int, Int, Int)]
example = do
  a <- [1,2]
  b <- [10,20]
  c <- [100,200]
  return (a,b,c)
-- [(1,10,100),(1,10,200),(1,20,100),(1,20,200),(2,10,100),(2,10,200),(2,20,100),(2,20,200)]

IO

A value of type IO a is a computation which, when performed, does some I/O before returning a value of type a. Desugaring the IO monad:

main :: IO ()
main = do putStrLn "What is your name: "
          name <- getLine
          putStrLn name
main :: IO ()
main = putStrLn "What is your name:" >>=
       \_    -> getLine >>=
       \name -> putStrLn name
main :: IO ()
main = putStrLn "What is your name: " >> (getLine >>= (\name -> putStrLn name))

See: Haskell 2010: Basic/Input Output

Whats the point?

Consider the non-intuitive fact that we now have a uniform interface for talking about three very different but foundational ideas for programming: Failure, Collections, and Effects.

Let's write down a new function called sequence which folds a function mcons, which we can think of as analogues to the list constructor (i.e. (a : b : [])) except it pulls the two list elements out of of two monadic values (p,q) using bind.

sequence :: Monad m => [m a] -> m [a] 
sequence = foldr mcons (return [])

mcons :: Monad m => m t -> m [t] -> m [t]
mcons p q = do
  x <- p
  y <- q
  return (x:y)

What does this function mean in terms of each of the monads discussed above?

Maybe

Sequencing a list of a Maybe values allows us to collect the results of a series of computations which can possibly fail and yield the aggregated values only if they all succeeded.

sequence :: [Maybe a] -> Maybe [a]
sequence [Just 3, Just 4]
-- Just [3,4]
sequence [Just 3, Just 4, Nothing]
-- Nothing

List

Since the bind operation for the list monad forms the pairwise list of elements from the two operands, folding the bind over a list of lists with sequence implements the general Cartesian product for an arbitrary number of lists.

sequence :: [[a]] -> [[a]]
sequence [[1,2,3],[10,20,30]]
-- [[1,10],[1,20],[1,30],[2,10],[2,20],[2,30],[3,10],[3,20],[3,30]]

IO

Sequence takes a list of IO actions, performs them sequentially, and returns the list of resulting values in the order sequenced.

sequence :: [IO a] -> IO [a]
sequence [getLine, getLine]
-- a
-- b
-- ["a","b"]

So there we have it, three fundamental concepts of computation that are normally defined independently of each other actually all share this similar structure that can be abstracted out and reused to build higher abstractions that work for all current and future implementations. If you want a motivating reason for understanding monads, this is it! This is the essence of what I wish I knew about monads looking back.

See: Control.Monad

Reader Monad

The reader monad let's us access shared immutable state within a monadic context.

ask :: Reader r a -> a
asks :: (r -> a) -> Reader r a
local :: (r -> b) -> Reader b a -> Reader r a
runReader :: Reader r a -> r -> a
import Control.Monad.Reader

data MyContext = MyContext
  { foo :: String
  , bar :: Int
  } deriving (Show)

computation :: Reader MyContext (Maybe String)
computation = do
  n <- asks bar
  x <- asks foo
  if n > 0
    then return (Just x)
    else return Nothing

ex1 :: Maybe String
ex1 = runReader computation $ MyContext "hello" 1

ex2 :: Maybe String
ex2 = runReader computation $ MyContext "haskell" 0

A simple implementation of the Reader monad:

newtype Reader r a = Reader { runReader :: r -> a }

instance Monad (Reader r) where
  return a = Reader $ \_ -> a
  m >>= k  = Reader $ \r -> runReader (k (runReader m r)) r

ask :: Reader a a
ask = Reader id

asks :: (r -> a) -> Reader r a
asks f = Reader f

local :: (r -> b) -> Reader b a -> Reader r a
local f m = Reader $ runReader m . f

Writer Monad

The reader monad let's us emit a lazy stream of values from within a monadic context.

tell :: w -> Writer w ()
execWriter :: Writer w a -> w
runWriter :: Writer w a -> (a, w)
import Control.Monad.Writer

type MyWriter = Writer [Int] String

example :: MyWriter
example  = do
  tell [1..5]
  tell [5..10]
  return "foo"

output :: (String, [Int])
output = runWriter example

An simple implementation of the Writer monad:

import Data.Monoid

newtype Writer w a = Writer { runWriter :: (a, w) }

instance Monoid w => Monad (Writer w) where
  return a = Writer (a, mempty)
  m >>= k  = Writer $ let
      (a, w)  = runWriter m
      (b, w') = runWriter (k a)
      in (b, w `mappend` w')

execWriter :: Writer w a -> w
execWriter m = snd (runWriter m)

tell :: w -> Writer w ()
tell w = Writer ((), w)

This implementation is lazy so some care must be taken that one actually wants only generate a stream of thunks. Often this it is desirable to produce a computation which requires a stream of thunks that can pulled lazily out of the runWriter, but often times the requirement is to produce a finite stream of values that are forced at the invocation of runWriter. Undesired laziness from Writer is a common source of grief, but is very remediable.

State Monad

The state monad allows functions within a stateful monadic context to access and modify shared state.

runState  :: State s a -> s -> (a, s)
evalState :: State s a -> s -> a
execState :: State s a -> s -> s
import Control.Monad.State

test :: State Int Int
test = do
  put 3
  modify (+1)
  get

main :: IO ()
main = print $ execState test 0

The state monad is often mistakingly described as being impure, but it is in fact entirely pure and the same effect could be achieved by explicitly passing state. An simple implementation of the State monad is only a few lines:

newtype State s a = State { runState :: s -> (a,s) }

instance Monad (State s) where
  return a = State $ \s -> (a, s)

  State act >>= k = State $ \s ->
    let (a, s') = act s
    in runState (k a) s'

get :: State s s
get = State $ \s -> (s, s)

put :: s -> State s ()
put s = State $ \_ -> ((), s)

modify :: (s -> s) -> State s ()
modify f = get >>= \x -> put (f x)

evalState :: State s a -> s -> a
evalState act = fst . runState act

execState :: State s a -> s -> s
execState act = snd . runState act

Monad Transformers

mtl / transformers

So the descriptions of Monads in the previous chapter are a bit of a white lie. Modern Haskell monad libraries typically use a more general form of the written in terms of monad transformers which allow us to compose monads together to form composite monads. The monads mentioned previously are subsumed by the special case of the transformer form composed with the Identity monad.

Monad Transformer Type Transformed Type
Maybe MaybeT Maybe a m (Maybe a)
Reader ReaderT r -> a r -> m a
Writer WriterT (a,w) m (a,w)
State StateT s -> (a,s) s -> m (a,s)
type State  s = StateT  s Identity
type Writer w = WriterT w Identity
type Reader r = ReaderT r Identity

instance Monad m => MonadState s (StateT s m)
instance Monad m => MonadReader r (ReaderT r m)
instance (Monoid w, Monad m) => MonadWriter w (WriterT w m)

In terms of generality the mtl library is the most common general interface for these monads, which itself depends on the transformers library which generalizes the "basic" monads described above into transformers.

See: transformers

Transformers

At their core monad transformers allow us to nest monadic computations in a stack with an interface to exchange values between the levels, called lift.

lift :: (Monad m, MonadTrans t) => m a -> t m a
liftIO :: MonadIO m => IO a -> m a
class MonadTrans t where
    lift :: Monad m => m a -> t m a

class (Monad m) => MonadIO m where
    liftIO :: IO a -> m a

instance MonadIO IO where
    liftIO = id

Just as the base monad class has laws, monad transformers also have several laws:

Law #1

lift . return = return

Law #2

lift (m >>= f) = lift m >>= (lift . f)

Or equivalently written in do notation we have:

Law #1

  do x <- lift m
     x

= do m

Law #2

  do x <- lift m 
     lift (f x)

= lift $ do x <- m
            f x

It's useful to remember that transformers compose outside-in but are unrolled inside out.

See: Monad Transformers: Step-By-Step

ReaderT

For example there exist three possible forms of Reader monad. The first is the Haskell 98 version that no longer exists but is useful for pedagogy. Together with the transformers variant and the mtl variants.

Reader

newtype Reader r a = Reader { runReader :: r -> a }

instance MonadReader r (Reader r) where
  ask       = Reader id
  local f m = Reader $ runReader m . f

ReaderT

newtype ReaderT r m a = ReaderT { runReaderT :: r -> m a }

instance (Monad m) => Monad (ReaderT r m) where
  return a = ReaderT $ \_ -> return a
  m >>= k  = ReaderT $ \r -> do
      a <- runReaderT m r
      runReaderT (k a) r

instance MonadTrans (ReaderT r) where
    lift m = ReaderT $ \_ -> m

MonadReader

class (Monad m) => MonadReader r m | m -> r where
  ask   :: m r
  local :: (r -> r) -> m a -> m a

instance (Monad m) => MonadReader r (ReaderT r m) where
  ask       = ReaderT return
  local f m = ReaderT $ \r -> runReaderT m (f r)

So hypothetically the three variants of ask would be:

ask :: Reader r a -> a
ask :: Monad m => ReaderT r m r
ask :: MonadReader r m => m r

In practice only the last one is used in modern Haskell.

Basics

The most basic use requires us to use the T-variants of the each of the monad transformers for the outer layers and to explicit lift and return values between each the layers. Monads have kind (* -> *) so monad transformers which take monads to monads have ((* -> *) -> * -> *):

Monad (m :: * -> *)
MonadTrans (t :: (* -> *) -> * -> *)

So for example if we wanted to form a composite computation using both the Reader and Maybe monads we can now could the Maybe inside of a ReaderT to form ReaderT t Maybe a.

import Control.Monad.Reader

type Env = [(String, Int)]
type Eval a = ReaderT Env Maybe a

data Expr
  = Val Int
  | Add Expr Expr
  | Var String
  deriving (Show)

eval :: Expr -> Eval Int
eval ex = case ex of

  Val n -> return n

  Add x y -> do
    a <- eval x
    b <- eval y
    return (a+b)

  Var x -> do
    env <- ask
    val <- lift (lookup x env)
    return val

env :: Env
env = [("x", 2), ("y", 5)]

ex1 :: Eval Int
ex1 = eval (Add (Val 2) (Add (Val 1) (Var "x")))

example1, example2 :: Maybe Int
example1 = runReaderT ex1 env
example2 = runReaderT ex1 []

The fundamental limitation of this approach is that we find ourselves lift.lift.lifting and return.return.returning a lot.

Newtype Deriving

Newtypes let us reference a date type with a single constructor as a new distinct type, with no runtime overhead from boxing, unlike a algebraic datatype with single constructor. Newtype wrappers around strings and numeric types can often drastically reduce accidental errors. Using -XGeneralizedNewtypeDeriving we can recover the functionality of instances of the underlying type.

{-# LANGUAGE GeneralizedNewtypeDeriving #-}

newtype Velocity = Velocity { unVelocity :: Double }
  deriving (Eq, Ord)

v :: Velocity
v = Velocity 2.718

x :: Double
x = 6.636

-- Type error is caught at compile time even though they are the same value at runtime!
err = v + x

newtype Quantity v a = Quantity a
  deriving (Eq, Ord, Num, Show)

data Haskeller
type Haskellers = Quantity Haskeller Int

a = Quantity 2 :: Haskellers
b = Quantity 6 :: Haskellers

totalHaskellers :: Haskellers
totalHaskellers = a + b
Couldn't match type `Double' with `Velocity'
Expected type: Velocity
  Actual type: Double
In the second argument of `(+)', namely `x'
In the expression: v + x

Using newtype deriving with the mtl library typeclasses we can produce flattened transformer types that don't require explicit lifting in the transform stack. For example a little stack machine the Reader Writer and State monads.

{-# LANGUAGE GeneralizedNewtypeDeriving #-}

import Control.Monad.Reader
import Control.Monad.Writer
import Control.Monad.State

type Stack   = [Int]
type Output  = [Int]
type Program = [Instr]

type VM a = ReaderT Program (WriterT Output (State Stack)) a

newtype Comp a = Comp { unComp :: VM a }
  deriving (Monad, MonadReader Program, MonadWriter Output, MonadState Stack)

data Instr = Push Int | Pop | Puts

evalInstr :: Instr -> Comp ()
evalInstr instr = case instr of
  Pop    -> modify tail
  Push n -> modify (n:)
  Puts   -> do
    tos <- gets head
    tell [tos]

eval :: Comp ()
eval = do
  instr <- ask
  case instr of
    []     -> return ()
    (i:is) -> evalInstr i >> local (const is) eval

execVM :: Program -> Output
execVM = flip evalState [] . execWriterT . runReaderT (unComp eval)

program :: Program
program = [
     Push 42,
     Push 27,
     Puts,
     Pop,
     Puts,
     Pop
  ]

main :: IO ()
main = mapM_ print $ execVM program

Efficiency

The second monad transformer law guarantees that sequencing consecutive lift operations is semantically equivalent to lifting the results into the outer monad.

do x <- lift m  ==  lift $ do x <- m
   lift (f x)                 f x

Although they are guaranteed to yield the same result the operation of lifting the results between the monad levels is not without cost and crops up frequently when working with the monad traversal and looping functions. For example all three of the functions on the left below are less efficient than the right hand side which performs the bind in the base monad instead of lifting on each iteration.

-- Less Efficient      More Efficient
forever (lift m)    == lift (forever m)
mapM_ xs (lift . f) == lift (mapM_ xs f)
forM_ xs (lift . f) == lift (forM_ xs f)

Language Extensions

It's important to distinguish the categories of language extensions fall into:

The inherent problem with classifying the extensions into General and Specialized category is that it's a subjective classification. Haskellers who do type astronautics will have a very different interpretation of Haskell then people who do database programming. As such this is a conservative assessment, as an arbitrary baseline let's consider FlexibleInstances and OverloadedStrings "everyday" while GADTs and TypeFamilies are "specialized".

Key

  • Benign implies that importing the extension won't change the semantics of the module if not used.
  • Historical implies that one shouldn't use this extension, it's in GHC purely for backwards compatibility. Sometimes these are dangerous to enable.
Benign Historical Extends Syntax Use Use GHC Reference
AllowAmbiguousTypes Specialized Typelevel Programming Ref
Arrows Specialized Syntax Extension Ref
AutoDeriveTypeable Specialized Metaprogramming Ref
BangPatterns General Strictness Annotation Ref
CApiFFI Specialized FFI Ref
ConstrainedClassMethods Specialized Typelevel Programming Ref
ConstraintKinds Specialized Typelevel Programming Ref
CPP General Preprocessor Ref
DataKinds Specialized Typelevel Programming Ref
DatatypeContexts Deprecated Deprecated Ref
DefaultSignatures Specialized Generic Programming Ref
DeriveDataTypeable General Generic Programming Ref
DeriveFoldable General Generic Programming Ref
DeriveFunctor General Generic Programming Ref
DeriveGeneric General Generic Programming Ref
DeriveTraversable General Generic Programming Ref
DisambiguateRecordFields Specialized Syntax Extension Ref
DoRec Specialized Syntax Extension Ref
EmptyCase Specialized Syntax Extension Ref
EmptyDataDecls General Syntax Extension Ref
ExistentialQuantification Specialized Typelevel Programming Ref
ExplicitForAll Specialized Typelevel Programming Ref
ExplicitNamespaces Specialized Syntax Disambiguation Ref
ExtendedDefaultRules Specialized Generic Programming Ref
FlexibleContexts General Typeclass Extension Ref
FlexibleInstances General Typeclass Extension Ref
ForeignFunctionInterface General FFI Ref
FunctionalDependencies General Typeclass Extension Ref
GADTs General Typelevel Programming Ref
GADTSyntax General Syntax Extension Ref
GeneralizedNewtypeDeriving General Typeclass Extension Ref
GHCForeignImportPrim Specialized FFI Ref
ImplicitParams Specialized Typelevel Programming Ref
ImpredicativeTypes Specialized Typelevel Programming Ref
IncoherentInstances Specialized Typelevel Programming Ref
InstanceSigs Specialized Typelevel Programming Ref
InterruptibleFFI Specialized FFI Ref
KindSignatures Specialized Typelevel Programming Ref
LambdaCase General Syntax Extension Ref
LiberalTypeSynonyms Specialized Typeclass Extension Ref
MagicHash Specialized GHC Internals Ref
MonadComprehensions Specialized Syntax Extension Ref
MonoPatBinds Specialized Type Disambiguation Ref
MultiParamTypeClasses General Typeclass Extension Ref
MultiWayIf Specialized Syntax Extension Ref
NamedFieldPuns Specialized Syntax Extension Ref
NegativeLiterals General Type Disambiguation Ref
NoImplicitPrelude Specialized Import Disambiguation Ref
NoMonoLocalBinds General Type Disambiguation Ref
NoMonomorphismRestriction General Type Disambiguation Ref
NPlusKPatterns Deprecated Deprecated Ref
NullaryTypeClasses Specialized Typeclass Extension Ref
NumDecimals General Type Disambiguation Ref
OverlappingInstances Specialized Typeclass Extension Ref
OverloadedLists General Syntax Extension Ref
OverloadedStrings General Syntax Extension Ref
PackageImports General Import Disambiguation Ref
ParallelArrays Specialized Data Parallel Haskell Ref
ParallelListComp General Syntax Extension Ref
PatternGuards General Syntax Extension Ref
PatternSynonyms General Syntax Extension Ref
PolyKinds Specialized Typelevel Programming Ref
PolymorphicComponents Specialized Deprecated Ref
PostfixOperators Specialized Syntax Extension Ref
QuasiQuotes Specialized Metaprogramming Ref
Rank2Types Specialized Historical Artificat Ref
RankNTypes Specialized Typelevel Programming Ref
RebindableSyntax Specialized Metaprogramming Ref
RecordPuns General Syntax Extension Ref
RecordWildCards General Syntax Extension Ref
RecursiveDo Specialized Syntax Extension Ref
RelaxedPolyRec Specialized Type Disambiguation Ref
RoleAnnotations Specialized Type Disambiguation Ref
Safe Specialized Security Auditing Ref
SafeImports Specialized Security Auditing Ref
ScopedTypeVariables Specialized Typelevel Programming Ref
StandaloneDeriving General Typeclass Extension Ref
TemplateHaskell Specialized Metaprogramming Ref
TraditionalRecordSyntax Specialized Historical Artificat Ref
TransformListComp Specialized Syntax Extension Ref
Trustworthy Specialized Security Auditing Ref
TupleSections General Syntax Extension Ref
TypeFamilies Specialized Typelevel Programming Ref
TypeHoles General Interactive Typing Ref
TypeOperators Specialized Typelevel Programming Ref
TypeSynonymInstances General Typeclass Extension Ref
UnboxedTuples Specialized FFI Ref
UndecidableInstances Specialized Typelevel Programming Ref
UnicodeSyntax Specialized Syntax Extension Ref
UnliftedFFITypes Specialized FFI Ref
Unsafe Specialized Security Auditing Ref
ViewPatterns General Syntax Extension Ref

See: GHC Extension Reference

The Dangerous

GHC's typechecker sometimes just casually tell us to enable language extensions when it can't solve certain problems. These include:

  • OverlappingInstances
  • IncoherentInstances
  • ImpredicativeTypes

These almost always these indicate a design flaw and shouldn't be turned on to remedy the error at hand, as much as GHC might suggest otherwise!

Inference

Inference in Haskell is generally quite accurate, although there are several boundary cases that tend to cause problems. Consider the two functions

Mututally Recursive Binding Groups

f x = const x g
g y = f 'A'

The inferred type signatures are correct in their usage, but don't represent the most general signatures. When GHC analyzes the module it analyzes the dependencies of expressions on each other, groups them together, and applies substitutions from unification across mutually defined groups. As such the inferred types may not be the most general types possible, and an explicit signature may be desired.

-- Inferred types
f :: Char -> Char
g :: t -> Char

-- Most general types
f :: a -> a
g :: a -> Char

Polymorphic recursion

data Tree a = Leaf | Bin a (Tree (a, a))

size Leaf = 0
size (Bin _ t) = 1 + 2 * size t

The problem with this expression is that the inferred type variable a in size spans two possible types (a and (a,a)), the recursion is polymorphic. These two types won't pass the occurs-check of typechecker and yield to an incorrect inferred type.

    Occurs check: cannot construct the infinite type: t0 = (t0, t0)
    Expected type: Tree t0
      Actual type: Tree (t0, t0)
    In the first argument of `size', namely `t'
    In the second argument of `(*)', namely `size t'
    In the second argument of `(+)', namely `2 * size t'

Simply adding an explicit type signature corrects this. Type inference using polymorphic recursion is undecidable in the general case.

size :: Tree a -> Int
size Leaf = 0
size (Bin _ t) = 1 + 2 * size t

See: Static Semantics of Function and Pattern Bindings

Monomorphism Restriction

The most common edge case of the inference is known as the dreaded monomorphic restriction.

When the toplevel declarations of a module are generalized the monomorphism restricts that toplevel values (i.e. expressions not under a lambda ) whose type contains the subclass of the Num type from the Prelude are not generalized and instead are instantiated with a monotype tried sequentially from the list specified by the default which is normally Integer then Double.

-- Double is inferred by type inferencer.
example1 :: Double
example1 = 3.14

-- In the presense of a lambda, a different type is inferred!
example2 :: Fractional a => t -> a
example2 _ = 3.14

default (Integer, Double)

As of GHC 7.8 the monomorphism restriction is switched off by default in GHCi.

λ: set +t

λ: 3
3 
it :: Num a => a

λ: default (Double)

λ: 3
3.0
it :: Num a => a

Safe Haskell

As everyone eventually finds out there are several functions within implementation of GHC ( not the Haskell language ) that can be used to subvert the type-system, they are marked with the prefix unsafe. These functions exist only for when one can manually prove the soundness of an expression but can't express this property in the type-system. Using these functions without fulfilling the proof obligations will cause all measure of undefined behavior with unimaginable pain and suffering, and are strongly discouraged. When initially starting out with Haskell there are no legitimate reason to use these functions at all, period.

unsafeCoerce :: a -> b
unsafePerformIO :: IO a -> a

The Safe Haskell language extensions allow us to restrict the use of unsafe language features using -XSafe which restricts the import of modules which are themselves marked as Safe. It also forbids the use of certain language extensions (-XTemplateHaskell) which can be used to produce unsafe code. The primary use case of these extensions is security auditing.

{-# LANGUAGE Safe #-}
{-# LANGUAGE Trustworthy #-}
{-# LANGUAGE Safe #-}

import Unsafe.Coerce
import System.IO.Unsafe

bad1 :: String
bad1 = unsafePerformIO getLine

bad2 :: a
bad2 = unsafeCoerce 3.14 ()
Unsafe.Coerce: Can't be safely imported!
The module itself isn't safe.

See: Safe Haskell

Pattern Guards

{-# LANGUAGE PatternGuards #-}

combine env x y
   | Just a <- lookup env x
   , Just b <- lookup env y
   = Just a + b

   | otherwise = Nothing

View Patterns

{-# LANGUAGE ViewPatterns #-}
{-# LANGUAGE NoMonomorphismRestriction #-}

import Safe

lookupDefault :: Eq a => a -> b -> [(a,b)] -> b
lookupDefault k _ (lookup k -> Just s) = s
lookupDefault _ d _ = d

headTup :: (a, [t]) -> [t]
headTup (headMay . snd -> Just n) = [n]
headTup _ = []

headNil :: [a] -> [a]
headNil (headMay -> Just x) = [x]
headNil _ = []

Misc Syntax Extensions

Tuple Sections

{-# LANGUAGE TupleSections #-}

fst' :: a -> (a, Bool)
fst' = (,True)

snd' :: a -> (a, Bool)
snd' = (True,)

example :: (Bool, Bool)
example = fst' False

Multi-way if-expressions

{-# LANGUAGE MultiWayIf #-}

operation x =
  if | x > 100   = 3  
     | x > 10    = 2  
     | x > 1     = 1  
     | otherwise = 0

Lambda Case

{-# LANGUAGE LambdaCase #-}

data Exp a
  = Lam a (Exp a)
  | Var a
  | App (Exp a) (Exp a)

example :: Exp a -> a
example = \case
  Lam a b -> a
  Var a   -> a
  App a b -> example a

Package Imports

import qualified "mtl" Control.Monad.Error as Error
import qualified "mtl" Control.Monad.State as State
import qualified "mtl" Control.Monad.Reader as Reader

Pattern Synonyms

Suppose we were writing a typechecker, it would very to common to include a distinct TArr term ease the telescoping of function signatures, this is what GHC does in it's Core language. Even though technically it could be written in terms of more basic application of the (->) constructor.

data Type
  = TVar TVar
  | TCon TyCon
  | TApp Type Type
  | TArr Type Type
  deriving (Show, Eq, Ord)

With pattern synonyms we can eliminate the extraneous constructor without loosing the convenience of pattern matching on arrow types.

{-# LANGUAGE PatternSynonyms #-}

pattern TArr t1 t2 = TApp (TApp (TCon "(->)") t1) t2

So now we can write an eliminator and constructor for arrow type very naturally.

{-# LANGUAGE PatternSynonyms #-}

import Data.List (foldl1')

type Name  = String
type TVar  = String
type TyCon = String

data Type
  = TVar TVar
  | TCon TyCon
  | TApp Type Type
  deriving (Show, Eq, Ord)


pattern TArr t1 t2 = TApp (TApp (TCon "(->)") t1) t2

tapp :: TyCon -> [Type] -> Type
tapp tcon args = foldl TApp (TCon tcon) args

arr :: [Type] -> Type
arr ts = foldl1' (\t1 t2 -> tapp "(->)" [t1, t2]) ts

elimTArr :: Type -> [Type]
elimTArr (TArr (TArr t1 t2) t3) = t1 : t2 : elimTArr t3
elimTArr (TArr t1 t2) = t1 : elimTArr t2
elimTArr t = [t]

-- (->) a ((->) b a)
-- a -> b -> a
to :: Type
to = arr [TVar "a", TVar "b", TVar "a"]

from :: [Type]
from = elimTArr to

Laziness

Again, a subject on which much ink has been spilled. There is an ongoing discussion in the land of Haskell about the compromises between lazy and strict evaluation, and there are nuanced arguments for having either paradigm be the default. Haskell takes a hybrid approach and allows strict evaluation when needed and uses laziness by default. Needless to say, we can always find examples where lazy evaluation exhibits worse behavior than strict evaluation and vice versa. They both have flaws, and as of yet there isn't a method that combines only the best of both worlds.

See:

Seq and WHNF

In Haskell evaluation only occurs at outer constructor of case-statements in Core. If we pattern match on a list we don't implicitly force all values in the list. A element in a data structure is only evaluated up to the most outer constructor. For example, to evaluate the length of a list we need only scrutinize the outer Cons constructors without regard for their inner values.

λ: length [undefined, 1]
2

λ: head [undefined, 1]
Prelude.undefined

λ: snd (undefined, 1)
1

λ: fst (undefined, 1)
Prelude.undefined

The command :sprintf can be usded to introspect the state of unevaluated thunks inside an expression without forcing evaluation. For instance:

λ: let a = [1..]
λ: let b = map (+ 1) a

λ: :sprint a
a = _
λ: :sprint b
b = _
λ: a !! 4
5
λ: :sprint a
a = 1 : 2 : 3 : 4 : 5 : _
λ: b !! 10
12
λ: :sprint a
a = 1 : 2 : 3 : 4 : 5 : 6 : 7 : 8 : 9 : 10 : 11 : _
λ: :sprint b
b = _ : _ : _ : _ : _ : _ : _ : _ : _ : _ : 12 : _

A term is said to be in weak head normal-form if the outermost constructor or lambda cannot be reduced further.

The seq function introduces an artificial dependence on the evaluation of order of two terms by requiring that the first argument be evaluated to WHNF before the evaluation of the second. The implementation of the seq function is an implementation detail of GHC.

seq :: a -> b -> b

⊥ `seq` a = ⊥
a `seq` b = b

The infamous foldl is well-known to leak space when used carelessly and without several compiler optimizations applied. The strict foldl' variant uses seq to overcome this.

foldl :: (a -> b -> a) -> a -> [b] -> a
foldl f z [] = z
foldl f z (x:xs) = foldl f (f z x) xs
foldl' :: (a -> b -> a) -> a -> [b] -> a
foldl' _ z [] = z
foldl' f z (x:xs) = let z' = f z x in z' `seq` foldl' f z' xs

The extension BangPatterns allows an alternative syntax to force arguments to functions to be wrapped in seq.

{-# LANGUAGE BangPatterns #-}

sum :: Num a => [a] -> a
sum = go 0
  where
    go !acc (x:xs) = go (acc + x) (go xs)
    go  acc []     = acc

This is desugared into code semantically equivalent to the following:

sum :: Num [a] => [a] -> a
sum = go 0
  where
    go acc _ | acc `seq` False = undefined
    go acc (x:xs)              = go (acc + x) (go xs)
    go acc []                  = acc

Function application to seq'd arguments often enough that is has a special operator.

($!) :: (a -> b) -> a -> b
f $! x  = let !vx = x in f vx

Deepseq

There are often times when for performance reasons we need to deeply evaluate a data structure to normal form leaving no terms unevaluated. The deepseq library performs this task.

class NFData a where
  rnf :: a -> ()
  rnf a = a `seq` ()

deepseq :: NFData a => a -> b -> a
($!!) :: (NFData a) => (a -> b) -> a -> b
instance NFData Int
instance NFData (a -> b)

instance NFData a => NFData (Maybe a) where
    rnf Nothing  = ()
    rnf (Just x) = rnf x

instance NFData a => NFData [a] where
    rnf [] = ()
    rnf (x:xs) = rnf x `seq` rnf xs
[1, undefined] `seq` ()
-- ()

[1, undefined] `deepseq` ()
-- Prelude.undefined

To force a data structure itself to be fully evaluated we share the same argument in both positions of deepseq.

force :: NFData a => a
force x = x `deepseq` x

Prelude

What to Avoid?

Haskell being a 25 year old language has witnessed several revolutions in the way we structure and compose functional programs. Yet as a result several portions of the Prelude still reflect old schools of thought that simply can't be removed without breaking significant parts of the ecosystem.

Currently it really only exists in folklore which parts to use and which not to use, although this is a topic that almost all introductory books don't mention and instead make extensive use of the Prelude for simplicity's sake.

The short version of the advice on the Prelude is:

  • Use fmap instead of map.
  • Use Foldable and Traversable instead of the Control.Monad, and Data.List versions of traversals.
  • Avoid partial functions like head and read or use their total variants.
  • Avoid asynchronous exceptions.
  • Avoid boolean blind functions.

The instances of Foldable for the list type often conflict with the monomorphic versions in the Prelude which are left in for historical reasons. So often times it is desirable to explicitly mask these functions from implicit import and force the use of Foldable and Traversable instead:

import  Data.List hiding ( 
    all , and , any , concat , concatMap , elem , filter ,
    find , foldl , foldl' , foldl1 , foldr , foldr1 ,
    mapAccumL , mapAccumR , maximum , maximumBy , minimum , 
    minimumBy , notElem , or , product , sum )

import Control.Monad hiding ( 
    forM , forM_ , mapM , mapM_ , msum , sequence , sequence_ )

The nuclear option is to exclude the entire prelude except by explicit qualified use or by the -XNoImplicitPrelude pragma.

import qualified Prelude as P

Partial Functions

A partial function is a function which doesn't terminate and yield a value for all given inputs. Conversely a total function terminates and is always defined for all inputs. As mentioned previously, certain historical parts of the Prelude are full of partial functions.

The difference between partial and total functions is the compiler can't reason about the runtime safety of partial functions purely from the information specified in the language and as such the proof of safety is left to the user to to guarantee. They are safe to use in the case where the user can guarantee that invalid inputs cannot occur, but like any unchecked property it's safety or not-safety is going to depend on the diligence of the programmer. This very much goes against the overall philosophy of Haskell and as such they are discouraged when not necessary.

head :: [a] -> a
read :: Read a => String -> a
(!!) :: [a] -> Int -> a

Safe

The Prelude has total variants of the historical partial functions (i.e. Text.Read.readMaybe)in some cases, but often these are found in the various utility libraries like safe.

The total versions provided fall into three cases:

  • May - return Nothing when the function is not defined for the inputs
  • Def - provide a default value when the function is not defined for the inputs
  • Note - call error with a custom error message when the function is not defined for the inputs. This is not safe, but slightly easier to debug!
-- Total
headMay :: [a] -> Maybe a
readMay :: Read a => String -> Maybea
atMay :: [a] -> Int -> Maybe a

-- Total
headDef :: a -> [a] -> a
readDef :: Read a => a -> String -> Maybea
atDef   :: a -> [a] -> Int -> a

-- Partial
headNote :: String -> [a] -> a
readNote :: Read a => String -> String -> Maybea
atNote   :: String -> [a] -> Int -> Maybe a

Boolean Blindness

data Bool = True | False

isJust :: Maybe a -> Bool
isJust (Just x) = True
isJust Nothing  = False

The problem with the boolean type is that there is effectively no difference between True and False at the type level. A proposition taking a value to a Bool takes any information given and destroys it. To reason about the behavior We have to trace the provenance of the proposition we're getting the boolean answer from, and this introduces whole slew of possibilities for misinterpretation. In the worst case, the only way to reason about safe and unsafe use of a function is by trusting that that a predicates lexical name reflects it's provenance!

For instance testing some proposition over a value which simply returns a Bool value representing whether the branch performs can perform the computation safely in the presence of a null is subject to accidental interchange. Consider that in a language like C or Python testing whether a value is null is indistinguishable to the language from testing whether the language is not null. Which of these programs encodes safe usage and which segfaults?

# This one?
if p(x):
    # use x
elif not p(x):
    # dont use x

# Or this one?
if p(x):
    # don't use x
elif not p(x):
    # use x

For inspection we can't tell without knowing how p is defined, the compiler doesn't can't distinguish the two and thus the language won't save us if we happen to mix them up. Instead of making invalid states unrepresentable we've made the invalid state indistinguishable from the valid one!

The more desirable practice is to match match on terms which explicitly witness the proposition as a type ( often in a sum type ) and won't typecheck otherwise.

case x of
  Just a  -> use x
  Nothing -> dont use x

-- not ideal
case p x of
  True  -> use x
  False -> dont use x

-- not ideal
if p x
  then use x
  else don't use x

To be fair though, many popular languages completely lack the notion of sum types ( the source of many woes in my opinion ) and only have product types, so this type of reasoning sometimes has no direct equivalence for those not familiar with ML family languages.

In Haskell, the Prelude provides functions like isJust and fromJust both of which can be used to subvert this kind of reasoning and make it easy to introduce bugs and should often be avoided.

See: Boolean Blindness

Foldable / Traversable

If coming from an imperative background retraining one's self to think about iteration over lists in terms of maps, folds, and scans can be challenging.

-- pseudocode
Prelude.foldl :: (a -> b -> a) -> a -> [b] -> a
Prelude.foldr :: (a -> b -> b) -> b -> [a] -> b

foldr f z [a...] = f a (f b ( ... (f y z) ... )) 
foldl f z [a...] = f ... (f (f z a) b) ... y 

Foldable and Traversable are the general interface for all traversable and folds of any data structures which is parameterized over it's element type ( List, Map, Set, Maybe, ...). These are two classes are used everywhere in modern Haskell and are extremely important.

A foldable instance allows us to apply functions to data types of monoidal values that collapse the structure using some logic over mappend.

A traversable instance allows us to apply functions to data types that walk the structure left-to-right within an applicative context.

class (Functor f, Foldable f) => Traversable f where
  traverse :: Applicative g => f (g a) -> g (f a)

class Foldable f where
  foldMap :: Monoid m => (a -> m) -> f a -> m
Data.Foldable.foldr :: Foldable t => (a -> b -> b) -> b -> t a -> b
Data.Foldable.foldl :: Foldable t => (a -> b -> a) -> a -> t b -> a
Data.Traversable.traverse :: (Applicative f, Traversable t) => (a -> f b) -> t a -> f (t b)

Most of the operations over lists can be generalized in terms in combinations of traverse and foldMap to derive more generation functions that work over all data structures implementing Foldable.

Data.Foldable.elem    :: (Eq a, Foldable t) => a -> t a -> Bool
Data.Foldable.sum     :: (Num a, Foldable t) => t a -> a
Data.Foldable.minimum :: (Ord a, Foldable t) => t a -> a
Data.Traversable.mapM :: (Monad m, Traversable t) => (a -> m b) -> t a -> m (t b)

Unfortunately for historical reasons the names exported by foldable quite often conflict with ones defined in the Prelude, either import them qualified or just disable the Prelude. The operations in the Foldable all specialize to the same behave the same as the ones Prelude for List types.

import Data.Monoid
import Data.Foldable
import Data.Traversable

import Control.Applicative
import Control.Monad.Identity (runIdentity)
import Prelude hiding (mapM_, foldr)

-- Rose Tree
data Tree a = Node a [Tree a] deriving (Show)

instance Functor Tree where
  fmap f (Node x ts) = Node (f x) (fmap (fmap f) ts)

instance Traversable Tree where
  traverse f (Node x ts) = Node <$> f x <*> traverse (traverse f) ts

instance Foldable Tree where
  foldMap f (Node x ts) = f x `mappend` foldMap (foldMap f) ts


tree :: Tree Integer
tree = Node 1 [Node 1 [], Node 2 [] ,Node 3 []]


example1 :: IO ()
example1 = mapM_ print tree

example2 :: Integer
example2 = foldr (+) 0 tree

example3 :: Maybe (Tree Integer)
example3 = traverse (\x -> if x > 2 then Just x else Nothing) tree

example4 :: Tree Integer
example4 = runIdentity $ traverse (\x -> pure (x+1)) tree

The instances we defined above can also be automatically derived by GHC using several language extensions. The automatic instances are identical to the hand-written versions above.

{-# LANGUAGE DeriveFunctor #-}
{-# LANGUAGE DeriveFoldable #-}
{-# LANGUAGE DeriveTraversable #-}

data Tree a = Node a [Tree a]
  deriving (Show, Functor, Foldable, Traversable)

See: Typeclassopedia

Split

The split package provides a variety of missing functions for splitting list and string types.

import Data.List.Split

example1 :: [String]
example1 = splitOn "." "foo.bar.baz"
-- ["foo","bar","baz"]

example2 :: [String]
example2 = chunksOf 10 "To be or not to be that is the question."
-- ["To be or n","ot to be t","hat is the"," question."]

Monad-loops

The monad-loops package provides a variety of missing functions for control logic in monadic contexts.

whileM :: Monad m => m Bool -> m a -> m [a]
untilM :: Monad m => m a -> m Bool -> m [a]
iterateUntilM :: Monad m => (a -> Bool) -> (a -> m a) -> a -> m a
whileJust :: Monad m => m (Maybe a) -> (a -> m b) -> m [b]

Text / ByteString

The default Haskell string type is the rather naive linked list of characters, that while perfectly fine for small identifiers is not well-suited for bulk processing.

type String = [Char]

For more performance sensitive cases there are two libraries for processing textual data: text and bytestring. With the -XOverloadedStrings extension string literals can be overloaded without the need for explicit packing and can be written as string literals in the Haskell source and overloaded via a typeclass IsString.

class IsString a where
  fromString :: String -> a

For instance:

λ: :type "foo"
"foo" :: [Char]

λ: :set -XOverloadedStrings

λ: :type "foo"
"foo" :: IsString a => a

Text

A Text type is a packed blob of Unicode characters.

pack :: String -> Text
unpack :: Text -> String
{-# LANGUAGE OverloadedStrings #-}

import qualified Data.Text as T

-- From pack
myTStr1 :: T.Text
myTStr1 = T.pack ("foo" :: String)

-- From overloaded string literal.
myTStr2 :: T.Text
myTStr2 = "bar"

See: Text

ByteString

ByteStrings are arrays of unboxed characters with either strict or lazy evaluation.

pack :: String -> ByteString
unpack :: ByteString -> String
{-# LANGUAGE OverloadedStrings #-}

import qualified Data.ByteString as S
import qualified Data.ByteString.Char8 as S8

-- From pack
bstr1 :: S.ByteString
bstr1 = S.pack ("foo" :: String)

-- From overloaded string literal.
bstr2 :: S.ByteString
bstr2 = "bar"

See: * Bytestring: Bits and Pieces * ByteString

Printf

import Data.Text
import Text.Printf

a :: Int
a = 3

b :: Double
b = 3.14159

c :: String
c = "haskell"

example :: String
example = printf "(%i, %f, %s)" a b c
-- "(3, 3.14159, haskell)"

Overloaded Lists

It is ubiquitous for data structure libraries to expose toList and fromList functions to construct various structures out of lists. As of GHC 7.8 we now have the ability to overload the list syntax in the surface language with a typeclass IsList.

class IsList l where
  type Item l
  fromList  :: [Item l] -> l
  toList    :: l -> [Item l]

instance IsList [a] where
  type Item [a] = a
  fromList = id
  toList   = id
λ: :type [1,2,3]
[1,2,3] :: (Num (Item l), IsList l) => l
{-# LANGUAGE OverloadedLists #-}
{-# LANGUAGE TypeFamilies #-}

import qualified Data.Map as Map
import GHC.Exts (IsList(..))

instance (Ord k) => IsList (Map.Map k v) where
  type Item (Map.Map k v) = (k,v)
  fromList = Map.fromList
  toList = Map.toList

example1 :: Map.Map String Int
example1 = [("a", 1), ("b", 2)]

Applicatives

Like monads Applicatives are an abstract structure for a wide class of computations that sit between functors and monads in terms of generality.

pure :: Applicative f => a -> f a
(<$>) :: Functor f => (a -> b) -> f a -> f b
(<*>) :: f (a -> b) -> f a -> f b

As of GHC 7.6, Applicative is defined as:

class Functor f => Applicative f where
  pure :: a -> f a
  (<*>) :: f (a -> b) -> f a -> f b

(<$>) :: Functor f => (a -> b) -> f a -> f b
(<$>) = fmap

With the following laws:

pure id <*> v = v
pure f <*> pure x = pure (f x)
u <*> pure y = pure ($ y) <*> u
u <*> (v <*> w) = pure (.) <*> u <*> v <*> w

As an example, consider the instance for Maybe:

instance Applicative Maybe where
  pure              = Just
  Nothing <*> _     = Nothing
  _ <*> Nothing     = Nothing
  Just f <*> Just x = Just (f x)

As a rule of thumb, whenever we would use m >>= return . f what we probably want is an applicative functor, and not a monad.

import Network.HTTP
import Control.Applicative ((<$>),(<*>))

example1 :: Maybe Integer
example1 = (+) <$> m1 <*> m2
  where
    m1 = Just 3
    m2 = Nothing
-- Nothing

example2 :: [(Int, Int, Int)]
example2 = (,,) <$> m1 <*> m2 <*> m3
  where
    m1 = [1,2]
    m2 = [10,20]
    m3 = [100,200]
-- [(1,10,100),(1,10,200),(1,20,100),(1,20,200),(2,10,100),(2,10,200),(2,20,100),(2,20,200)]

example3 :: IO String
example3 = (++) <$> fetch1 <*> fetch2
  where
    fetch1 = simpleHTTP (getRequest "http://www.fpcomplete.com/") >>= getResponseBody
    fetch2 = simpleHTTP (getRequest "http://www.haskell.org/") >>= getResponseBody

The pattern f <$> a <*> b ... shows us so frequently that there are a family of functions to lift applicatives of a fixed number arguments. This pattern also shows up frequently with monads (liftM, liftM2, liftM3).

liftA :: Applicative f => (a -> b) -> f a -> f b
liftA f a = pure f <*> a

liftA2 :: Applicative f => (a -> b -> c) -> f a -> f b -> f c
liftA2 f a b = f <$> a <*> b

liftA3 :: Applicative f => (a -> b -> c -> d) -> f a -> f b -> f c -> f d
liftA3 f a b c = f <$> a <*> b <*> c

See: Applicative Programming with Effects

Typeclass Hierarchy

In principle every monad arises out of an applicative functor (and by corollary a functor) but due to historical reasons Applicative isn't a superclass of the Monad typeclass. A hypothetical fixed Prelude might have:

class Functor f where
  fmap :: (a -> b) -> f a -> f b
 
class Functor f => Applicative f where
  pure :: a -> f a
  (<*>) :: f (a -> b) -> f a -> f b
 
class Applicative m => Monad m where
  (>>=) :: m a -> (a -> m b) -> m b
  ma >>= f = join (fmap f ma)

return :: Applicative m => a -> m a
return = pure

join :: Monad m => m (m a) -> m a
join x = x >>= id

See: Functor-Applicative-Monad Proposal

Alternative

Alternative is an extension of the Applicative class with a zero element and an associative binary operation respecting the zero.

class Applicative f => Alternative f where
  -- | The identity of '<|>'
  empty :: f a
  -- | An associative binary operation
  (<|>) :: f a -> f a -> f a
  -- | One or more.
  some :: f a -> f [a]
  -- | Zero or more.
  many :: f a -> f [a]

optional :: Alternative f => f a -> f (Maybe a)
instance Alternative Maybe where
    empty = Nothing
    Nothing <|> r = r
    l       <|> _ = l

instance Alternative [] where
    empty = []
    (<|>) = (++)
λ: foldl1 (<|>) [Nothing, Just 5, Just 3]
Just 5

These instances show up very frequently in parsers where the alternative operator can model alternative parse branches.

Polyvariadic Functions

One surprising application of typeclasses is the ability to construct functions which take an arbitrary number of arguments by defining instances over function types. The arguments may be of arbitrary type, but the resulting collected arguments must either converted into a single type or unpacked into a sum type.

{-# LANGUAGE FlexibleInstances #-}

class Arg a where
  collect' :: [String] -> a

-- extract to IO
instance Arg (IO ()) where
  collect' acc = mapM_ putStrLn acc

-- extract to [String]
instance Arg [String] where
  collect' acc = acc

instance (Show a, Arg r) => Arg (a -> r) where
  collect' acc = \x -> collect' (acc ++ [show x])

collect :: Arg t => t
collect = collect' []

example1 :: [String]
example1 = collect 'a' 2 3.0

example2 :: IO ()
example2 = collect () "foo" [1,2,3]

See: Polyvariadic functions

Error Handling

Control.Exception

The low-level (and most dangerous) way to handle errors is to use the throw and catch functions which allow us to throw extensible extensions in pure code but catch the resulting exception within IO. Of specific note is that return value of the throw inhabits all types. There's no reason to use this for custom code that doesn't use low-level system operations.

throw :: Exception e => e -> a
catch :: Exception e => IO a -> (e -> IO a) -> IO a
try :: Exception e => IO a -> IO (Either e a)
evaluate :: a -> IO a
{-# LANGUAGE DeriveDataTypeable #-}

import Data.Typeable
import Control.Exception

data MyException = MyException
    deriving (Show, Typeable)

instance Exception MyException

evil :: [Int]
evil = [throw MyException]

example1 :: Int
example1 = head evil

example2 :: Int
example2 = length evil

main :: IO ()
main = do
  a <- try (evaluate example1) :: IO (Either MyException Int)
  print a

  b <- try (return example2) :: IO (Either MyException Int)
  print b

Exceptions

The problem with the previous approach is having to rely on GHC's asynchronous exception handling inside of IO to handle basic operations. The exceptions provides the same API as Control.Exception but loosens the dependency on IO.

{-# LANGUAGE DeriveDataTypeable #-}

import Data.Typeable
import Control.Monad.Catch
import Control.Monad.Identity

data MyException = MyException
    deriving (Show, Typeable)

instance Exception MyException

example :: MonadCatch m => Int -> Int -> m Int
example x y | y == 0 = throwM MyException
            | otherwise = return $ x `div` y

pure :: MonadCatch m => m (Either MyException Int)
pure = do
  a <- try (example 1 2)
  b <- try (example 1 0)
  return (a >> b)

See: exceptions

Either

The instance of the Either monad is simple, note the bias toward Left when binding.

instance Monad (Either e) where
  return x = Right x

  (Left x)  >>= f = Left x
  (Right x) >>= f = f x

The silly example one always sees is writing safe division function that fails out with a Left value when a division by zero happens and holds the resulting value in Right otherwise.

sdiv :: Double -> Double -> Either String Double
sdiv _ 0 = throwError "divide by zero"
sdiv i j = return $ i / j

example :: Double -> Double -> Either String Double
example n m = do
  a <- sdiv n m
  b <- sdiv 2 a
  c <- sdiv 2 b
  return c

throwError :: String -> Either String b
throwError a = Left a

main :: IO ()
main = do
  print $ example 1 5
  print $ example 1 0

This is admittedly pretty stupid but captures the essence of why Either/EitherT is an suitable monad for exception handling.

ErrorT

Another slightly clumsy method is to use the ErrorT transformer composed with an Identity and unrolling into an Either Exception a. This method is simple but doesn't compose well depending on the situation and interaction with IO.

import Control.Monad.Error
import Control.Monad.Identity

data Exception
  = Failure String
  | GenericFailure
  deriving Show

instance Error Exception where
  noMsg = GenericFailure

type ErrMonad a = ErrorT Exception Identity a

example :: Int -> Int -> ErrMonad Int
example x y = do
  case y of
    0 -> throwError $ Failure "division by zero"
    x -> return $ x `div` y

runFail :: ErrMonad a -> Either Exception a
runFail = runIdentity . runErrorT

example1 :: Either Exception Int
example1 = runFail $ example 2 3

example2 :: Either Exception Int
example2 = runFail $ example 2 0

EitherT

newtype EitherT e m a = EitherT {runEitherT :: m (Either e a)}
        -- Defined in `Control.Monad.Trans.Either'
runEitherT :: EitherT e m a -> m (Either e a)
tryIO :: MonadIO m => IO a -> EitherT IOException m a

throwT  :: Monad m => e -> EitherT e m r
catchT  :: Monad m => EitherT a m r -> (a -> EitherT b m r) -> EitherT b m r
handleT :: Monad m => (a -> EitherT b m r) -> EitherT a m r -> EitherT b m

The ideal monad to use is simply the EitherT monad which we'd like to be able to use an with an API similar to ErrorT. For example suppose we wanted to use read to attempt to read a positive integer from stdin. There are two failure modes and two failure cases here, one for a parse error which fails with an error from Prelude.readIO and one for a non-positive integer which fails with a custom exception after a check. We'd like to be unify both cases in the same transformer.

Combined, the safe and errors make life with EitherT more pleasant. The safe library provides a variety of safer variants of the standard prelude functions that handle failures as Maybe values, explicitly passed default values, or more informative exception "notes". While the errors library reexports the safe Maybe functions and hoists them up into the EitherT monad providing a family of try prefixed functions that perform actions and can fail with an exception.

-- Exception handling equivalent of `read`
tryRead :: (Monad m, Read a) => e -> String -> EitherT e m a

-- Exception handling equivelent of `head`
tryHead :: Monad m => e -> [a] -> EitherT e m a

-- Exception handling equivelent of `(!!)`
tryAt :: Monad m => e -> [a] -> Int -> EitherT e m a
import Control.Error
import Control.Monad.Trans

data Failure
  = NonPositive Int
  | ReadError String
  deriving Show

main :: IO ()
main = do
  putStrLn "Enter a positive number."
  s <- getLine

  e <- runEitherT $ do
      n <- tryRead (ReadError s) s
      if n > 0
        then return $ n + 1
        else throwT $ NonPositive n

  case e of
      Left  n -> putStrLn $ "Failed with: " ++ show n
      Right s -> putStrLn $ "Succeeded with: " ++ show s

See:

Advanced Monads

Function Monad

If one writes Haskell long enough one might eventually encounter the curious beast that is the ((->) r) monad instance. It generally tends to be non-intuitive to work with, but is quite simple when one considers it as an unwrapped Reader monad.

instance Functor ((->) r) where
  fmap = (.)

instance Monad ((->) r) where
  return = const
  f >>= k = \r -> k (f r) r

This just uses a prefix form of the arrow type operator.

import Control.Monad

id' :: (->) a a
id' = id

const' :: (->) a ((->) b a)
const' = const

-- Monad m => a -> m a
fret :: a -> b -> a
fret = return

-- Monad m => m a -> (a -> m b) -> m b
fbind :: (r -> a) -> (a -> (r -> b)) -> (r -> b)
fbind f k = f >>= k

-- Monad m => m (m a) -> m a
fjoin :: (r -> (r -> a)) -> (r -> a)
fjoin = join

fid :: a -> a
fid = const >>= id

-- Functor f => (a -> b) -> f a -> f b
fcompose :: (a -> b) -> (r -> a) -> (r -> b)
fcompose = (.)
type Reader r = (->) r -- pseudocode

instance Monad (Reader r) where
  return a = \_ -> a
  f >>= k = \ r -> k (f r) r

ask' :: r -> r
ask' = id

asks' :: (r -> a) -> (r -> a)
asks' f = id . f

runReader' :: (r -> a) -> r -> a
runReader' = id

RWS Monad

The RWS monad is a combines the functionality of the three monads discussed above, the Reader, Writer, and State. There is also a RWST transformer.

runReader :: Reader r a -> r -> a
runWriter :: Writer w a -> (a, w)
runState  :: State s a -> s -> (a, s)

These three eval functions are now combined into the following functions:

runRWS  :: RWS r w s a -> r -> s -> (a, s, w)
execRWS :: RWS r w s a -> r -> s -> (s, w)
evalRWS :: RWS r w s a -> r -> s -> (a, w)
import Control.Monad.RWS

type R = Int
type W = [Int]
type S = Int

computation :: RWS R W S ()
computation = do
  e <- ask
  a <- get
  let b = a + e
  put b
  tell [b]

example = runRWS computation 2 3

The usual caveat about Writer laziness also applies to RWS.

Cont

runCont :: Cont r a -> (a -> r) -> r
callCC :: MonadCont m => ((a -> m b) -> m a) -> m a
cont :: ((a -> r) -> r) -> Cont r a

In continuation passing style, composite computations are built up from sequences of nested computations which are terminated by a final continuation which yields the result of the full computation by passing a function into the continuation chain.

add :: Int -> Int -> Int
add x y = x + y

add :: Int -> Int -> (Int -> r) -> r
add x y k = k (x + y)
import Control.Monad
import Control.Monad.Cont

add :: Int -> Int -> Cont k Int
add x y = return $ x + y

mult :: Int -> Int -> Cont k Int
mult x y = return $ x * y

contt :: ContT () IO ()
contt = do
    k <- do
      callCC $ \exit -> do
        lift $ putStrLn "Entry"
        exit $ \_ -> do
          putStrLn "Exit"
    lift $ putStrLn "Inside"
    lift $ k ()

callcc :: Cont String Integer
callcc = do
  a <- return 1
  b <- callCC (\k -> k 2)
  return $ a+b

ex1 :: IO ()
ex1 = print $ runCont (f >>= g) id
  where
    f = add 1 2
    g = mult 3
-- 9

ex2 :: IO ()
ex2 = print $ runCont callcc show
-- "3"

ex3 :: IO ()
ex3 = runContT contt print
-- Entry
-- Inside
-- Exit

main :: IO ()
main = do
  ex1
  ex2
  ex3

Using continuations and especially callCC can inadvertently create very convoluted control flow so some care must taken.

newtype Cont r a = Cont { runCont :: ((a -> r) -> r) }

instance Monad (Cont r) where
  return a       = Cont $ \k -> k a
  (Cont c) >>= f = Cont $ \k -> c (\a -> runCont (f a) k)

class (Monad m) => MonadCont m where
  callCC :: ((a -> m b) -> m a) -> m a

instance MonadCont (Cont r) where
  callCC f = Cont $ \k -> runCont (f (\a -> Cont $ \_ -> k a)) k

MonadPlus

Choice and failure.

class Monad m => MonadPlus m where
   mzero :: m a 
   mplus :: m a -> m a -> m a

instance MonadPlus [] where
   mzero = []
   mplus = (++)

instance MonadPlus Maybe where
   mzero = Nothing

   Nothing `mplus` ys  = ys
   xs      `mplus` _ys = xs

MonadPlus forms a monoid with

mzero `mplus` a = a
a `mplus` mzero = a
(a `mplus` b) `mplus` c = a `mplus` (b `mplus` c)
when :: (Monad m) => Bool -> m () -> m ()
when p s =  if p then s else return ()

guard :: MonadPlus m => Bool -> m ()
guard True  = return ()
guard False = mzero

msum :: MonadPlus m => [m a] -> m a
msum =  foldr mplus mzero
import Safe
import Control.Monad

list1 :: [(Int,Int)]
list1 = [(a,b) | a <- [1..25], b <- [1..25], a < b]

list2 :: [(Int,Int)]
list2 = do
  a <- [1..25]
  b <- [1..25]
  guard (a < b)
  return $ (a,b)

maybe1 :: String -> String -> Maybe Double
maybe1 a b = do
  a' <- readMay a
  b' <- readMay b
  guard (b' /= 0.0)
  return $ a'/b'

maybe2 :: Maybe Int
maybe2 = msum [Nothing, Nothing, Just 3, Just 4]
import Control.Monad

range :: MonadPlus m => [a] -> m a
range [] = mzero
range (x:xs) = range xs `mplus` return x

pyth :: Integer -> [(Integer,Integer,Integer)]
pyth n = do
  x <- range [1..n]
  y <- range [1..n]
  z <- range [1..n]
  if x*x + y*y == z*z then return (x,y,z) else mzero

main :: IO ()
main = print $ pyth 15
{-
[ ( 12 , 9 , 15 )
, ( 12 , 5 , 13 )
, ( 9 , 12 , 15 )
, ( 8 , 6 , 10 )
, ( 6 , 8 , 10 )
, ( 5 , 12 , 13 )
, ( 4 , 3 , 5 )
, ( 3 , 4 , 5 )
]
-}

MonadFix

The fixed point of a monadic computation. mfix f executes the action f only once, with the eventual output fed back as the input.

fix :: (a -> a) -> a
fix f = let x = f x in x

mfix :: (a -> m a) -> m a
class Monad m => MonadFix m where
   mfix :: (a -> m a) -> m a

instance MonadFix Maybe where
   mfix f = let a = f (unJust a) in a
            where unJust (Just x) = x
                  unJust Nothing  = error "mfix Maybe: Nothing"

The regular do-notation can also be extended with -XRecursiveDo to accomodate recursive monaidc bindings.

{-# LANGUAGE RecursiveDo #-}

import Control.Applicative
import Control.Monad.Fix

stream1 :: Maybe [Int]
stream1 = do
  rec xs <- Just (1:xs)
  return (map negate xs)

stream2 :: Maybe [Int]
stream2 = mfix $ \xs -> do
  xs' <- Just (1:xs)
  return (map negate xs')

ST Monad

The ST monad models "threads" of stateful computations which can manipulate mutable references but are restricted to only return pure values when evaluated and are statically confined to the ST monad of a s thread.

runST :: (forall s. ST s a) -> a
newSTRef :: a -> ST s (STRef s a)
readSTRef :: STRef s a -> ST s a
writeSTRef :: STRef s a -> a -> ST s ()
import Data.STRef
import Control.Monad
import Control.Monad.ST
import Control.Monad.State.Strict

example1 :: Int
example1 = runST $ do
  x <- newSTRef 0

  forM_ [1..1000] $ \j -> do
    writeSTRef x j

  readSTRef x

example2 :: Int
example2 = runST $ do
  count <- newSTRef 0
  replicateM_ (10^6) $ modifySTRef' count (+1)
  readSTRef count

example3 :: Int
example3 = flip evalState 0 $ do
  replicateM_ (10^6) $ modify' (+1)
  get

modify' :: MonadState a m => (a -> a) -> m ()
modify' f = get >>= (\x -> put $! f x)

Using the ST monad we can create a new class of efficient purely functional data structures that use mutable references.

Free Monads

Pure :: a -> Free f a
Free :: f (Free f a) -> Free f a

liftF :: (Functor f, MonadFree f m) => f a -> m a
retract :: Monad f => Free f a -> f a

Free monads are monads which instead of having a join operation that combines computations, instead forms composite computations from application of a functor.

join :: Monad m => m (m a) -> m a
wrap :: MonadFree f m => f (m a) -> m a

One of the best examples is the Partiality monad which models computations which can diverge. Haskell allows unbounded recursion, but for example we can create a free monad from the Maybe functor which when can be used to fix the call-depth of, for example the Ackermann function.

import Control.Monad.Fix
import Control.Monad.Free

type Partiality a = Free Maybe a

-- Non-termination.
never :: Partiality a
never = fix (Free . Just)

fromMaybe :: Maybe a -> Partiality a
fromMaybe (Just x) = Pure x
fromMaybe Nothing = Free Nothing

runPartiality :: Int -> Partiality a -> Maybe a
runPartiality 0 _ = Nothing
runPartiality _ (Pure a) = Just a
runPartiality _ (Free Nothing) = Nothing
runPartiality n (Free (Just a)) = runPartiality (n-1) a

ack :: Int -> Int -> Partiality Int
ack 0 n = Pure $ n + 1
ack m 0 = Free $ Just $ ack (m-1) 1
ack m n = Free $ Just $ ack m (n-1) >>= ack (m-1)

main :: IO ()
main = do
  let diverge = never :: Partiality ()
  print $ runPartiality 1000 diverge
  print $ runPartiality 1000 (ack 3 4)
  print $ runPartiality 5500 (ack 3 4)

The other common use for free monads to build embedded domain languages to describe computations. We can model a subset of the IO monad by building up a pure description of the computation inside of the IOFree monad and then using the free monad to encode the translation to an effectful IO computation.

{-# LANGUAGE DeriveFunctor #-}

import System.Exit
import Control.Monad.Free

data Interaction x
  = Puts String x
  | Gets (Char -> x)
  | Exit
  deriving Functor

type IOFree a = Free Interaction a

puts :: String -> IOFree ()
puts s = liftF $ Puts s ()

get :: IOFree Char
get = liftF $ Gets id

exit :: IOFree r
exit = liftF Exit

gets :: IOFree String
gets = do
  c <- get
  if c == '\n'
    then return ""
    else gets >>= \line -> return (c : line)

-- Collapse our IOFree DSL into IO monad actions.
interp :: IOFree a -> IO a
interp (Pure r) = return r
interp (Free x) = case x of
  Puts s t -> putStrLn s >> interp t
  Gets f   -> getChar >>= interp . f
  Exit     -> exitSuccess

echo :: IOFree ()
echo = do
  puts "Enter your name:"
  str <- gets
  puts str
  if length str > 10
    then puts "You have a long name."
    else puts "You have a short name."
  exit

main :: IO ()
main = interp echo

An implementation such as the one found in free might look like the following:

{-# LANGUAGE MultiParamTypeClasses #-}

import Control.Applicative

data Free f a
  = Pure a
  | Free (f (Free f a))

instance Functor f => Monad (Free f) where
  return a     = Pure a
  Pure a >>= f = f a
  Free f >>= g = Free (fmap (>>= g) f)

class Monad m => MonadFree f m  where
  wrap :: f (m a) -> m a

liftF :: (Functor f, MonadFree f m) => f a -> m a
liftF = wrap . fmap return

iter :: Functor f => (f a -> a) -> Free f a -> a
iter _ (Pure a) = a
iter phi (Free m) = phi (iter phi <$> m)

retract :: Monad f => Free f a -> f a
retract (Pure a) = return a
retract (Free as) = as >>= retract

See:

Indexed Monads

Indexed monads are a generalisation of monads that adds an additional type parameter to the class that carries information about the computation or structure of the monadic implementation.

class IxMonad md where
  return :: a -> md i i a
  (>>=) :: md i m a -> (a -> md m o b) -> md i o b

The canonical use-case is a variant of the vanilla State which allows type-changing on the state for intermediate steps inside of the monad. This indeed turns out to very useful for handling a class of problems involving resource management since the extra index parameter gives us space to statically enforce the sequence of monadic actions by allowing and restriction certain state transitions on the index parameter at compile-time.

To make this more usable we'll use the somewhat esoteric -XRebindableSyntax allowing us to overload the do-notation and if-then-else syntax by providing alternative definitions local to the module.

{-# LANGUAGE RebindableSyntax #-}
{-# LANGUAGE ScopedTypeVariables #-}
{-# LANGUAGE NoMonomorphismRestriction #-}

import Data.IORef
import Data.Char
import Prelude hiding (fmap, (>>=), (>>), return)
import Control.Applicative

newtype IState i o a = IState { runIState :: i -> (a, o) }

evalIState :: IState i o a -> i -> a
evalIState st i = fst $ runIState st i

execIState :: IState i o a -> i -> o
execIState st i = snd $ runIState st i

ifThenElse :: Bool -> a -> a -> a
ifThenElse b i j = case b of
  True -> i
  False -> j

return :: a -> IState s s a
return a = IState $ \s -> (a, s)

fmap :: (a -> b) -> IState i o a -> IState i o b
fmap f v = IState $ \i -> let (a, o) = runIState v i
                          in (f a, o)

join :: IState i m (IState m o a) -> IState i o a
join v = IState $ \i -> let (w, m) = runIState v i
                        in runIState w m

(>>=) :: IState i m a -> (a -> IState m o b) -> IState i o b
v >>= f = IState $ \i -> let (a, m) = runIState v i
                         in runIState (f a) m

(>>) :: IState i m a -> IState m o b -> IState i o b
v >> w = v >>= \_ -> w

get :: IState s s s
get = IState $ \s -> (s, s)

gets :: (a -> o) -> IState a o a
gets f = IState $ \s -> (s, f s)

put :: o -> IState i o ()
put o = IState $ \_ -> ((), o)

modify :: (i -> o) -> IState i o ()
modify f = IState $ \i -> ((), f i)



data Locked = Locked
data Unlocked = Unlocked

type Stateful a = IState a Unlocked a

acquire :: IState i Locked ()
acquire = put Locked

-- Can only release the lock if it's held, try release the lock
-- that's not held is a now a type error.
release :: IState Locked Unlocked ()
release = put Unlocked

-- Statically forbids improper handling of resources.
lockExample :: Stateful a
lockExample = do ptr <- get  :: IState a a a
                 acquire     :: IState a Locked ()
                 -- ...
                 release     :: IState Locked Unlocked ()
                 return ptr

-- Couldn't match type `Locked' with `Unlocked'
-- In a stmt of a 'do' block: return ptr
failure1 :: Stateful a
failure1 = do ptr <- get
              acquire
              return ptr -- didn't release

-- Couldn't match type `a' with `Locked'
-- In a stmt of a 'do' block: release
failure2 :: Stateful a
failure2 = do ptr <- get
              release -- didn't acquire
              return ptr

-- Evaluate the resulting state, statically ensuring that the
-- lock is released when finished.
evalReleased :: IState i Unlocked a -> i -> a
evalReleased f st = evalIState f st

example :: IO (IORef Integer)
example = evalReleased <$> pure lockExample <*> newIORef 0

See: Fun with Indexed monads

Quantification

Universal Quantification

Universal quanitfication the primary mechanism of encoding polymorphism in Haskell. The essence of universal quantification is that we can express functions which operate the same way for a set of types and whose function behavior is entirely determined only by the behavior of all types in this span.

{-# LANGUAGE ExplicitForAll #-}

-- ∀a. [a]
example1 :: forall a. [a]
example1 = []

-- ∀a. [a]
example2 :: forall a. [a]
example2 = [undefined]

-- ∀a. ∀b. (a → b) → [a] → [b]
map' :: forall a. forall b. (a -> b) -> [a] -> [b]
map' f = foldr ((:) . f) []

-- ∀a. [a] → [a]
reverse' :: forall a. [a] -> [a]
reverse' = foldl (flip (:)) []

Normally quantifiers are omitted in type signatures since in Haskell's vanilla surface language it is unambiguous to assume to that free type variables are universally quantified.

A universally quantified type-variable actually implies quite a few rather deep properties about the implementation of a function that can be deduced from it's type signature. For instance the identity function in Haskell is guarnateed to only have one implementation since the only information that the information that can present in the body

id :: forall. a -> a
id x = x

The same with the function fmap, the only implementation possible given a function (a -> b) and a functor f a is a implementation which applies (a -> b) over every a inside f a and that every b in f b uniquely maps to some input value. It is not possible to write an implementation which did not have this property, and this high-level property just falls out the interplay of quantifiers in the type signature!

fmap :: forall a b. (a -> b) -> f a -> f b

Type Systems

Hindley Milner Typesystem

The Hindley-Milner typesystem is historically import as one of the first typed lambda calculi that admitted both polymorphism and a variety of inference techniques that could always decide principle types.

e : x
  | λx:t.e            -- value abstraction
  | e1 e2             -- application
  | let x = e1 in e2  -- let

t : t -> t     -- function types
  | a          -- type variables

σ :  a . t    -- type scheme

In an implementation, the function generalize converts all type variables within the type that into polymorphic type variables yielding a type scheme. The function instantiate maps a scheme to a type, but with any polymorphic variables converted into unbound type variables.

Rank-N Types

System-F is the type system that underlies Haskell. System-F subsumes the HM type system in the sense that every type expressible in HM can be expressed within System-F.

t : t -> t     -- function types
  | a          -- type variables
  |  a . t    -- forall 

e : x          -- variables
  | λx:t.e     -- value abstraction
  | e1 e2      -- value application
  | Λa.e       -- type abstraction 
  | e t        -- type application
id :  t. t -> t
id = Λt. λx:t. x
id = (\ (@ t) (x :: t) -> x

tr ::  a.  b. a -> b -> a
tr = Λa. Λb. λx:a. λy:b. x

fl ::  a.  b. a -> b -> b
fl = Λa. Λb. λx:a. λy:b. y

nil ::  a. [a]
nil = Λa. Λb. -> λ (z :: b) . λ (f :: a -> b -> b). z

cons :: forall a. a -> [a] -> [a]
cons = Λ a -> λ(x :: a) -> λ(xs :: forall b. b -> (a -> b -> b) -> b)
    -> Λ b -> λ(z :: b) -> λ(f :: a -> b -> b) -> f x (xs @ b z f)

Normally when Haskell's typechecker infers a type signature it places all quantifiers of type variables at the outermost position such that that no quantifiers appear within the body of the type expression, called the prenex restriction This restrict an entire class of type signatures that are would otherwise expressible within System-F, but has the benefit of making inference much easier.

-XRankNTypes loosens the prenex restriction such that we may explicitly place quantifiers within the body of the type. The bad news is that the general problem of inference in this relaxed system is undecidable in general, so we're required to explicitly annotate functions which use RankNTypes or they are otherwise inferred as rank 1 and may not typecheck at all.

{-# LANGUAGE RankNTypes #-}

-- Can't unify ( Bool ~ Char )
rank1 :: forall a. (a -> a) -> (Bool, Char)
rank1 f = (f True, f 'a')

rank2 :: (forall a. a -> a) -> (Bool, Char)
rank2 f = (f True, f 'a')

auto :: (forall a. a -> a) -> (forall b. b -> b)
auto x = x

xauto :: forall a. (forall b. b -> b) -> a -> a
xauto f = f
Monomorphic Rank 0: t
Polymorphic Rank 1: forall a. a -> t
Polymorphic Rank 2: (forall a. a -> t) -> t
Polymorphic Rank 3: ((forall a. a -> t) -> t) -> t

For example the ST monad uses a second rank type to prevent the capture of references between ST monads with separate state threads.

Existential Quantification

The essence of universal quantification is that we can express functions which operate the same way for any type, while for existential quantification we can express functions that operate over an some unknown type. Using an existential we can group heterogeneous values together with a functions under the existential, that manipulate the data types but whose type signature hides this information.

{-# LANGUAGE ExistentialQuantification #-}
{-# LANGUAGE RankNTypes #-}

-- ∃ t. (t, t → t, t → String)
data Box = forall a. Box a (a -> a) (a -> String)

boxa :: Box
boxa = Box 1 negate show

boxb :: Box
boxb = Box "foo" reverse show

apply :: Box -> String
apply (Box x f p) = p (f x)

-- ∃ t. Show t => t
data SBox = forall a. Show a => SBox a

boxes :: [SBox]
boxes = [SBox (), SBox 2, SBox "foo"]

showBox :: SBox -> String
showBox (SBox a) = show a

main :: IO ()
main = mapM_ (putStrLn . showBox) boxes
-- ()
-- 2
-- "foo"

The existential over SBox gathers a collection of values defined purely in terms of their their Show interface, no other information is available about the values and they can't be accessed or unpacked in any other way.

{-# LANGUAGE RankNTypes #-}

-- The functor is a fixed implementation of the library internals.
type Exists a b = forall f. Functor f => (b -> f b) -> (a -> f a)

type Get a b = a -> b
type Set a b = a -> b -> a

example :: Get a b -> Set a b -> Exists a b
example f g l a = fmap (g a) (l (f a))

Use of existentials can be used to recreate certain concepts from the so-called "Object Oriented Paradigm", a school of thought popularized in the late 80s that attempted to decompose programming logic into anthropomorphic entities and actions instead of the modern equational treatment. Recreating this model in Haskell is widely considered to be an antipattern.

See: Haskell Antipattern: Existential Typeclass

Impredicative Types

Although extremely brittle, GHC also has limited support impredicative polymorphism which loosens the restriction that that quantifiers must precede arrow types and now may be placed inside of type-constructors.

-- Can't unify ( Int ~ Char )

revUni :: forall a. Maybe ([a] -> [a]) -> Maybe ([Int], [Char])
revUni (Just g) = Just (g [3], g "hello")
revUni Nothing  = Nothing
{-# LANGUAGE ImpredicativeTypes #-}

f :: (forall a. [a] -> a) -> (Int, Char)
f get = (get [1,2], get ['a', 'b', 'c'])

g :: Maybe (forall a. [a] -> a) -> (Int, Char)
g Nothing = (0, '0')
g (Just get) = (get [1,2], get ['a','b','c'])

Use of this extension is rare, although GHC is very liberal about telling us to enable it when one accidentally makes a typo in a type signature!

Scoped Type Variables

Normally the type variables used within the toplevel signature for a function are only scoped to the type-signature and not the body of the function and it's rigid signatures over terms and let/where clauses. Enabling -XScopedTypeVariables loosens this restriction allowing the type variables mentioned in the toplevel to be scoped within the body.

{-# LANGUAGE ExplicitForAll #-}
{-# LANGUAGE ScopedTypeVariables #-}

poly :: forall a b c. a -> b -> c -> (a, a)
poly x y z = (f x y, f x z)
  where
    -- second argument is universally quantified from inference
    -- f :: forall t0 t1. t0 -> t1 -> t0
    f x' _ = x'

mono :: forall a b c. a -> b -> c -> (a, a)
mono x y z = (f x y, f x z)
  where
    -- b is not implictly universally quantified because it is in scope
    f :: a -> b -> a
    f x' _ = x'

example :: IO ()
example = do
  x :: [Int] <- readLn
  print x

GADTs

Void

The Void type is the type with no inhabitants. It unifies only with itself.

Using a newtype wrapper we can create a type where recursion makes it impossible to construct an inhabitant.

-- Void :: Void -> Void
newtype Void = Void Void

Or using -XEmptyDataDecls we can also construct the uninhabited type equivalently as a data declaration with no constructors.

data Void

The only inhabitant of both of these construction is a diverging bottom term like (undefined).

Phantom Types

Phantom types are paramaters that appear on the left hand side of a type declaration but which are not constrained by the values of the types inhabitants. They are effectively slots for us to encode additional information at the type-level.

import Data.Void

data Foo tag a = Foo a

combine :: Num a => Foo tag a -> Foo tag a -> Foo tag a
combine (Foo a) (Foo b) = Foo (a+b)

-- All identical at the value level, but differ at the type level.
a :: Foo () Int
a = Foo 1

b :: Foo t Int
b = Foo 1

c :: Foo Void Int
c = Foo 1

-- () ~ ()
example1 :: Foo () Int
example1 = combine a a

-- t ~ ()
example2 :: Foo () Int
example2 = combine a b

-- t0 ~ t1
example3 :: Foo t Int
example3 = combine b b

-- Couldn't match type `t' with `Void'
example4 :: Foo t Int
example4 = combine b c

Notice t type variable tag does not appear in the right hand side of the declaration. Using this allows us to express invariants at the type-level that need not manifest at the value-level. We're effectively programming by adding extra information at the type-level.

See: Fun with Phantom Types

GADTs

GADTs are an extension to algebraic datatypes that allow us to qualify the constructors to datatypes with type equality constraints, allowing a class of types that are not expressible using vanilla ADTs.

-XGADTs implicitly enables an alternative syntax for datatype declarations ( -XGADTSyntax ) such the following declaration are equivalent:

data List a
  = Empty
  | Cons a (List a)

data List a where
  Empty :: List a
  Cons :: a -> List a -> List a

For an example use consider the data type Term, we have a term in which we Succ which takes a Term parameterized by a which span all types. Problems arise between the clash whether (a ~ Bool) or (a ~ Int) when trying to write the evaluator.

data Term a
  = Lit a
  | Succ (Term a)
  | IsZero (Term a)

-- can't be well-typed :(
eval (Lit i)      = i
eval (Succ t)     = 1 + eval t
eval (IsZero i)   = eval i == 0

And we admit the construction of meaningless terms which forces more error handling cases.

-- This is a valid type.
failure = Succ ( Lit True )

Using a GADT we can express the type invariants for our language (i.e. only type-safe expressions are representable). Pattern matching on this GADTs then carries type equality constraints without the need for explicit tags.

{-# Language GADTs #-}

data Term a where
  Lit    :: a -> Term a
  Succ   :: Term Int -> Term Int
  IsZero :: Term Int -> Term Bool
  If     :: Term Bool -> Term a -> Term a -> Term a

eval :: Term a -> a
eval (Lit i)      = i                                   -- Term a
eval (Succ t)     = 1 + eval t                          -- Term (a ~ Int)
eval (IsZero i)   = eval i == 0                         -- Term (a ~ Int)
eval (If b e1 e2) = if eval b then eval e1 else eval e2 -- Term (a ~ Bool)

example :: Int
example = eval (Succ (Succ (Lit 3)))

This time around:

-- This is rejected at compile-time.
failure = Succ ( Lit True )

Explicit constraints (a ~ b) can be added to a function's context that the compiler should be able to deduce that two types are equal up to unification.

-- f :: a -> a -> (a,a)
-- f :: (a ~ b) => a -> b -> (a,b)
f x y = (x,y)

Kind Signatures

Recall that the kind in Haskell's type system the "type of the types" or kinds is the type system consisting the single kind * and an arrow kind ->.

κ : *
  | κ -> κ 
Int :: *
Maybe :: * -> *
Either :: * -> * -> *

On top of default GADT declaration we can also constrain the parameters of the GADT to specific kinds. For basic usage Haskell's kind inference can deduce this reasonably well, but combined with some other type system extensions that extend the kind system this becomes essential.

{-# Language GADTs #-}
{-# LANGUAGE KindSignatures #-}

data Term a :: * where
  Lit    :: a -> Term a
  Succ   :: Term Int -> Term Int
  IsZero :: Term Int -> Term Bool
  If     :: Term Bool -> Term a -> Term a -> Term a

data Vec :: * -> * -> * where
  Nil :: Vec n a
  Cons :: a -> Vec n a -> Vec n a

data Fix :: (* -> *) -> * where
  In :: f (Fix f) -> Fix f

Type Equality

With a richer language for datatypes we can express terms that witness the relationship between terms in the constructors, for example we can now express a term which expresses propositional equality between two types.

The type Eql a b is a proof that types a and b are equal, by pattern matching on the single Refl constructor we introduce the equality constraint into the body of the pattern match.

{-# LANGUAGE GADTs #-}
{-# LANGUAGE ExplicitForAll #-}

-- a ≡ b
data Eql a b where
  Refl :: Eql a a

-- Congruence
-- (f : A → B) {x y} → x ≡ y → f x ≡ f y
cong :: Eql a b -> Eql (f a) (f b)
cong Refl = Refl

-- Symmetry
-- {a b : A} → a ≡ b → a ≡ b
sym :: Eql a b -> Eql b a
sym Refl = Refl

-- Transitivity
-- {a b c : A} → a ≡ b → b ≡ c → a ≡ c
trans :: Eql a b -> Eql b c -> Eql a c
trans Refl Refl = Refl

-- Coerce one type to another given a proof of their equality.
-- {a b : A} → a ≡ b → a → b
castWith :: Eql a b -> a -> b
castWith Refl = id

-- Trival cases
a :: forall n. Eql n n
a = Refl

b :: forall. Eql () ()
b = Refl

As of GHC 7.8 these constructors and functions are included in the Prelude in the Data.Type.Equality module.

Lambda Calculus

The lambda calculus forms the theoretical and pracitcal foundational for many languages. At the heart of every calculus is three components:

  • Var - A variable
  • Lam - A lambda abstraction
  • App - An application

There are many different ways of modeling these constructions and data structure representations, but they all more or less contain these three elements. For example, a lambda calculus that uses String names on lambda binders and variables might be written like the following:

type Name = String

data Exp 
  = Var Name
  | Lam Name Exp
  | App Exp Exp

A lambda expression in which all variables that appear in the body of the expression are referenced in an outer lambda binder is said to be closed while an expression with unbound free variables is open.

SK Combinators

A closed lambda expression is also sometimes called a combinator, it takes several arguments and manipulates and applies them in some pattern to yield a result. The most famous combinators are the SKI combinators which are interesting in context of several proofs concerning properties of the lambda calculus.

s :: (a -> b -> c) -> (a -> b) -> a -> c
s f g x =  f x (g x)

k :: a -> b -> a
k x y = x

i :: a -> a
i x = x

true = k
false = k i

In fact the I combinator can actually be derived ( in several ways ) in terms of the more basic SK combinators.

SKK
=((λxyz.xz(yz))(λxy.x)(λxy.x))
=((λyz.(λxy.x)z(yz))(λxy.x))
=λz.(λxy.x)z((λxy.x)z)
=λz.(λy.z)((λxy.x)z)
=λz.z
=I

In fact, in an untyped lambda calculus the Y combinator can also be written in terms of SK.

Y=SSK(S(K(SS(S(SSK))))K)

Really, all we need is S and K!

Church Encoding

In Church's original formulation of the lambda calculus there were no ground types ( integer, booleans, lists ), and remarkably we can actually build all of these constructions using nothing more than lambdas.

Data types like the natural numbers above can also be encoded as lambda expressions with constructors for the datatype modeled as indexed parameters to the lambda expressions. Using this method we can encode recursive definition of natural numbers, lists, and even the expression type for the untyped lambda calculus.

{-# LANGUAGE RankNTypes #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE TypeSynonymInstances #-}

import Prelude hiding (not, succ, pred, fst, snd, tail, head)

type CBool = forall a. a -> a -> a

-- Booleans
true, false :: CBool
true x y  = x
false x y = y

-- Logic
not p      = p false true
and p q    = p q false
or p q     = p true q
cond p x y = p x y
xor p q    = p (not q) q

-- Tuples
fst p      = p true
snd p      = p false
pair a b f = f a b

-- Combinators
i x = x
k x y = x
s x y z = x z (y z)

b x y z = x (y z)
c x y z = x z y
w x y = x y y

-- Church Arithmetic
iszero n = n (\x -> false) true
succ n f x = f (n f x)

plus m n f x = n f (m f x)
sub m n = (n pred) m

mult m n f = m (n f)
pow m n = n m
pred n f x = n (\g h -> h (g f)) (\u -> x) i

leq m n = iszero (sub m n)
geq m n = not (leq m n)

-- Church Numbers
type CNat = forall a. (a -> a) -> a -> a

zero, one, two, three :: CNat
zero  f x = x
one   f x = f x
two   f x = f (f x)
three f x = f (f (f x))

-- Scott Lists (lists as nested tuples)
nil z      = z
cons x y   = pair false (pair x y)
null z     = z true
head z     = fst (snd z)
tail z     = snd (snd z)
index xs n = head (n tail xs)

-- data Nat = Z | S Nat
ezero   = \s z -> z
esucc n = \s z -> s (n s z)

-- data Expr = Lam Expr | App Expr Expr | Var Int
elam f   = \l a v -> l f
eapp t u = \l a v -> a t u
evar n   = \l a v -> v n

-- Convert between Ints and Church Numbers
unchurch :: CNat -> Integer
unchurch n = n (\i -> i + 1) 0

church :: Int -> CNat
church n =
  if n == 0
  then zero
  else \f x -> f (church (n-1) f x)

unbool :: (Bool -> Bool -> t) -> t
unbool n = n True False



ex1 :: Integer
ex1 = unchurch (pow three three)
-- 27

ex2 :: Bool
ex2 = unbool (iszero (pred one))
-- True

ex3 :: Integer
ex3 = snd (pair 1 2)
-- 2

ex4 :: Integer
ex4 = head (tail (cons 1 (cons 2 nil)))
-- 2

ex5 :: Bool
ex5 = unbool (true `xor` false)
-- True

Although theoretically interesting, Church numbers are not of much practical use in Haskell. Although one particular encoding of the list ( Church list ) type turns out to be very useful in practice.

example :: (a -> b -> b) -> b -> b
example cons nil = cons 1 (cons 2 (cons 3 nil))
{-# LANGUAGE RankNTypes #-}

newtype List a = List (forall b. (a -> b -> b) -> b -> b)

fromList :: [a] -> List a
fromList xs = List (\n c -> foldr n c xs)

toList :: List a -> [a]
toList xs = unList xs (:) []

unList :: List a
        -> (a -> b -> b) -- Cons
        -> b             -- Nil
        -> b
unList (List l) = l

nil :: List a
nil = List (\n c -> c)

cons :: a -> List a -> List a
cons x xs = List (\n c -> n x (unList xs n c))

append :: List a -> List a -> List a
append xs ys = List (\n c -> unList xs n (unList ys n c))

singleton :: a -> List a
singleton x = List (\n c -> n x c)

length :: List a -> Integer
length (List l) = l (\_ n -> n + 1) 0

test :: [Integer]
test = toList (fromList [1,2,3] `append` fromList [4,5,6])

See: Mogensen–Scott encoding

Substitution

The downside to using alphabetical terms to bound variable in a closure is that dealing with open lambda expressions. For instance if we perform the naive substitution s = [y / x] over the term:

λy.yx

We get the result:

λx.xx

Which fundamentally changes the meaning of the expression. We expect that substitution should preserve alpha equivalence.

To overcome this we ensure that our substitution function checks the free variables in each subterm before performing substitution and introduces new names where neccessary. Such a substitution is called a capture-avoiding substitution. There are several techniques to implement capture-avoiding substitutions in an efficient way.

de Bruijn Indices

Instead of using string names, an alternative representation of the lambda calculus uses integers to stand for names on binders.

Named de Bruijn
S λx y z. x z (y z) λ λ λ (3 1) (2 1)
K λ x y. x λ λ 2
I λ x. x λ 1

In this system the process of substitution becomes much more mechanical and simply involves shifting indices and can be made very efficient. Although in this form human intution about expressions breaks down and such it is better to convert to this kind of form as an interemdiate step after parsing into a named form.

import Control.Monad
import Text.PrettyPrint
import qualified Data.Map as Map

-- de Bruijn indices
data DExp
  = Var Integer
  | Lam DExp
  | App DExp DExp
  deriving (Eq)

subst :: DExp -> Integer -> DExp -> DExp
subst e n (Var n')
  | n == n' = e
  | otherwise = (Var n')
subst e n (Lam e') = Lam $ subst e (n+1) e'
subst e n (App e1 e2) = App (subst e n e1) (subst e n e2)

nf :: DExp -> DExp
nf e@(Var _) = e
nf (Lam e) = Lam (nf e)
nf (App f a) =
  case whnf f of
      Lam b -> nf (subst a 0 b)
      f' -> App (nf f') (nf a)

whnf :: DExp -> DExp
whnf e@(Var _) = e
whnf e@(Lam _) = e
whnf (App f a) =
  case whnf f of
      Lam b -> whnf (subst a 0 b)
      f' -> App f' a


-- Pretty printer

parensIf ::  Bool -> Doc -> Doc
parensIf True = parens
parensIf False = id

class Pretty p where
  ppr :: Int -> p -> Doc

instance Pretty DExp where
    ppr _ (Var v)   = integer (v+1)
    ppr p (Lam f)   = parensIf (p>0) $ text "λ " <> ppr p f
    ppr p (App f x) = ppr' f <+> ppr' x
      where
        ppr' (Var v) = integer (v+1)
        ppr' expr    = parens $ ppr p expr

ppexpr :: DExp -> String
ppexpr = render . ppr 0

-- Locally named
data NExp
  = EVar String
  | ELam String NExp
  | EApp NExp NExp
  deriving (Show)

type Ctx = Map.Map String Integer

letters :: [String]
letters = [1..] >>= flip replicateM ['a'..'z']

shift :: Ctx -> NExp -> DExp
shift c (EVar v) = Var (c Map.! v)
shift c (EApp a b) = App (shift c a) (shift c b)
shift c (ELam v body) = Lam (shift c' body)
  where c' = Map.insert v 0 (Map.map (+1) c)

toDeBruijn :: NExp -> DExp
toDeBruijn = shift Map.empty

fromDeBruijn :: DExp -> NExp
fromDeBruijn = from 0
  where from n (Var i)   = EVar (letters !! (n - (fromIntegral i) - 1))
        from n (Lam b)   = ELam (letters !! n) (from (succ n) b)
        from n (App f a) = EApp (from n f) (from n a)

i = ELam "a" (EVar "a")
k = ELam "a" (ELam "b" (EVar "a"))
s = ELam "a" (ELam "b" (ELam "c" (EApp (EApp (EVar "a") (EVar "c")) (EApp (EVar "b") (EVar "c")))))

ex1 = ppexpr $ toDeBruijn i
-- λ 1
ex2 = ppexpr $ toDeBruijn k
-- λ λ 2
ex3 = ppexpr $ toDeBruijn s
-- λ λ λ (3 1) (2 1)
ex4 = fromDeBruijn $ toDeBruijn s
-- ELam "a" (ELam "b" (ELam "c" (EApp (EApp (EVar "a") (EVar "c")) (EApp (EVar "b") (EVar "c")))))

HOAS

Higher Order Abstract Syntax (HOAS) is a technique for encoding the lambda calculus that exploits the function type of the host language ( i.e. Haskell ) to give us capture-avoiding substitution in our custom language by exploiting Haskell's implementation.

{-# LANGUAGE GADTs #-}

data Expr a where
  Con :: a -> Expr a
  Lam :: (Expr a -> Expr b) -> Expr (a -> b)
  App :: Expr (a -> b) -> Expr a -> Expr b

i :: Expr (a -> a)
i = Lam (\x -> x)

k :: Expr (a -> b -> a)
k = Lam (\x -> Lam (\y -> x))

s :: Expr ((a -> b -> c) -> (a -> b) -> (a -> c))
s = Lam (\x -> Lam (\y -> Lam (\z -> App (App x z) (App y z))))

eval :: Expr a -> a
eval (Con v) = v
eval (Lam f) = \x -> eval (f (Con x))
eval (App e1 e2) = (eval e1) (eval e2)


skk :: Expr (a -> a)
skk = App (App s k) k

example :: Integer
example = eval skk 1
-- 1

There is no however no safeguard preventing us from generating Haskell functions which do not encode meaningful lambda calculus expression. For example:

Lam (\x -> let x = x in x )

Pretty printing HOAS encoded terms can also be quite complicated since the body of the function is under a Haskell lambda binder.

PHOAS

A slightly different form of HOAS called PHOAS uses lambda datatype parameterized over the binder type. In this form evaluation requires unpacking into a seperate Value type to wrap the lambda expression.

{-# LANGUAGE RankNTypes #-}

data ExprP a
  = VarP a
  | AppP (ExprP a) (ExprP a)
  | LamP (a -> ExprP a)
  | LitP Integer

data Value
  = VLit Integer
  | VFun (Value -> Value)

fromVFun :: Value -> (Value -> Value)
fromVFun val = case val of
  VFun f -> f
  _      -> error "not a function"

fromVLit :: Value -> Integer
fromVLit val = case val of
  VLit n -> n
  _      -> error "not a integer"

newtype Expr = Expr { unExpr :: forall a . ExprP a }

eval :: Expr -> Value
eval e = ev (unExpr e) where
  ev (LamP f)      = VFun(ev . f)
  ev (VarP v)      = v
  ev (AppP e1 e2)  = fromVFun (ev e1) (ev e2)
  ev (LitP n)      = VLit n

i :: ExprP a
i = LamP (\a -> VarP a)

k :: ExprP a
k = LamP (\x -> LamP (\y -> VarP x))

s :: ExprP a
s = LamP (\x -> LamP (\y -> LamP (\z -> AppP (AppP (VarP x) (VarP z)) (AppP (VarP y) (VarP z)))))

skk :: ExprP a
skk = AppP (AppP s k) k

example :: Integer
example = fromVLit $ eval $ Expr (AppP skk (LitP 3))

See:

Interpreters

Expression Problem

Final Interpreters

Using typeclasses we can implement a final interpreter which models a set of extensible terms using functions bound to typeclasses rather than data constructors. Instances of the typeclass form interpreters over these terms.

For example we can write a small language that includes basic arithmetic, and then retroactively extend our expression language with a multiplication operator without changing the base. At the same time our interpeter interpreter logic remains invariant under extension with new expressions.

{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE TypeSynonymInstances #-}
{-# LANGUAGE NoMonomorphismRestriction #-}

class Expr repr where
  lit :: Int -> repr
  neg :: repr -> repr
  add :: repr -> repr -> repr
  mul :: repr -> repr -> repr

instance Expr Int where
  lit n = n
  neg a = -a
  add a b = a + b
  mul a b = a * b

instance Expr String where
  lit n = show n
  neg a = "(-" ++ a ++ ")"
  add a b = "(" ++ a ++ " + " ++ b ++ ")"
  mul a b = "(" ++ a ++ " * " ++ b ++ ")"

class BoolExpr repr where
  eq :: repr -> repr -> repr
  tr :: repr
  fl :: repr

instance BoolExpr Int where
  eq a b = if a == b then tr else fl
  tr = 1
  fl = 0

instance BoolExpr String where
  eq a b = "(" ++ a ++ " == " ++ b ++ ")"
  tr = "true"
  fl = "false"

eval :: Int -> Int
eval = id

render :: String -> String
render = id

expr :: (BoolExpr repr, Expr repr) => repr
expr = eq (add (lit 1) (lit 2)) (lit 3)

result :: Int
result = eval expr
-- 1

string :: String
string = render expr
-- "((1 + 2) == 3)"

Finally Tagless

Writing an evaluator for the lambda calculus can likewise also be modeled with a final interpreter and a Identity functor.

import Prelude hiding (id)

class Expr rep where
  lam :: (rep a -> rep b) -> rep (a -> b)
  app :: rep (a -> b) -> (rep a -> rep b)
  lit :: a -> rep a

newtype Interpret a = R { reify :: a }

instance Expr Interpret where
  lam f   = R $ reify . f . R
  app f a = R $ reify f $ reify a
  lit     = R

eval :: Interpret a -> a
eval e = reify e

e1 :: Expr rep => rep Int
e1 = app (lam (\x -> x)) (lit 3)

e2 :: Expr rep => rep Int
e2 = app (lam (\x -> lit 4)) (lam $ \x -> lam $ \y -> y)

example1 :: Int
example1 = eval e1
-- 3

example2 :: Int
example2 = eval e2
-- 4

See: Typed Tagless Interpretations and Typed Compilation

Datatypes

The usual hand-wavy of describing algebraic datatypes is to indicate the how natural correspondence between sum types, product types, and polynomial expressions arises.

data Void                       -- 0
data Unit     = Unit            -- 1
data Sum a b  = Inl a | Inr b   -- a + b
data Prod a b = Prod a b        -- a * b
type (->) a b = a -> b          -- b ^ a

Intuitively it follows the notion that the cardinality of set of inhabitants of a type can always be given as a function of the number of it's holes. A product type admits a number of inhabitants as a function of the product (i.e. cardinality of the Cartesian product), a sum type as as the sum of it's holes and a function type as the exponential of the span of the domain and codomain.

-- 1 + A
data Maybe a = Nothing | Just a

Recursive types are correspond to infinite series of these terms.

-- pseudocode

-- μX. 1 + X
data Nat a = Z | S Nat
Nat a = μ a. 1 + a
      = 1 + (1 + (1 + ...))

-- μX. 1 + A * X
data List a = Nil | Cons a (List a)
List a = μ a. 1 + a * (List a) 
       = 1 + a + a^2 + a^3 + a^4 ...

-- μX. A + A*X*X
data Tree a f = Leaf a | Tree a f f
Tree a = μ a. 1 + a * (List a) 
       = 1 + a^2 + a^4 + a^6 + a^8 ...

See: Species and Functors and Types, Oh My!

F-Algebras

The initial algebra approach differs from the final interpreter approach in that we now represent our terms as algebraic datatypes and the interpreter implements recursion and evaluation occurs through pattern matching.

type Algebra f a = f a -> a
type Coalgebra f a = a -> f a
newtype Fix f = Fix { unFix :: f (Fix f) }

cata :: Functor f => Algebra f a -> Fix f -> a
ana  :: Functor f => Coalgebra f a -> a -> Fix f
hylo :: Functor f => Algebra f b -> Coalgebra f a -> a -> b

In Haskell a F-algebra in a functor f a together with function f a -> a. A colagebra reverses the function. For a functor f we can form it's recursive unrolling using the recursive Fix newtype wrapper.

newtype Fix f = Fix { unFix :: f (Fix f) }

Fix :: f (Fix f) -> Fix f
unFix :: Fix f -> f (Fix f)
Fix f = f (f (f (f (f (f ( ... ))))))

newtype T b a = T (a -> b)

Fix (T a)
Fix T -> a
(Fix T -> a) -> a
(Fix T -> a) -> a -> a
...

In this form we can write down a generalized fold/unfold function that are datatype generic and written purely in terms of the recursing under the functor.

cata :: Functor f => Algebra f a -> Fix f -> a
cata alg = alg . fmap (cata alg) . unFix

ana :: Functor f => Coalgebra f a -> a -> Fix f
ana coalg = Fix . fmap (ana coalg) . coalg

We call these functions catamorphisms and anamorphisms. Notice especially that the types of thees two functions simply reverse the direction of arrows. Interpreted in another way they transform an algebra/colaglebra which defines a flat structure-preserving mapping between Fix f f into a function which either rolls or unrolls the fixpoint. What is particularly nice about this approach is that the recursion is abstracted away inside the functor definition and we are free to just implement the flat transformation logic!

For example a construction of the natural numbers in this form:

{-# LANGUAGE TypeOperators #-}
{-# LANGUAGE DeriveFunctor #-}
{-# LANGUAGE StandaloneDeriving #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE UndecidableInstances #-}

type Algebra f a = f a -> a
type Coalgebra f a = a -> f a

newtype Fix f = Fix { unFix :: f (Fix f) }

-- catamorphism
cata :: Functor f => Algebra f a -> Fix f -> a
cata alg = alg . fmap (cata alg) . unFix

-- anamorphism
ana :: Functor f => Coalgebra f a -> a -> Fix f
ana coalg = Fix . fmap (ana coalg) . coalg

-- hylomorphism
hylo :: Functor f => Algebra f b -> Coalgebra f a -> a -> b
hylo f g = cata f . ana g

type Nat = Fix NatF
data NatF a = S a | Z deriving (Eq,Show)

instance Functor NatF where
  fmap f Z     = Z
  fmap f (S x) = S (f x)

plus :: Nat -> Nat -> Nat
plus n = cata phi where
  phi Z     = n
  phi (S m) = s m

times :: Nat -> Nat -> Nat
times n = cata phi where
  phi Z     = z
  phi (S m) = plus n m

int :: Nat -> Int
int = cata phi where
  phi  Z    = 0
  phi (S f) = 1 + f

nat :: Integer -> Nat
nat = ana (psi Z S) where
  psi f _ 0 = f
  psi _ f n = f (n-1)

z :: Nat
z = Fix Z

s :: Nat -> Nat
s = Fix . S


type Str = Fix StrF
data StrF x = Cons Char x | Nil

instance Functor StrF where
  fmap f (Cons a as) = Cons a (f as)
  fmap f Nil = Nil

nil :: Str
nil = Fix Nil

cons :: Char -> Str -> Str
cons x xs = Fix (Cons x xs)

str :: Str -> String
str = cata phi where
  phi Nil         = []
  phi (Cons x xs) = x : xs

str' :: String -> Str
str' = ana (psi Nil Cons) where
  psi f _ []     = f
  psi _ f (a:as) = f a as

map' :: (Char -> Char) -> Str -> Str
map' f = hylo g unFix
  where
    g Nil        = Fix Nil
    g (Cons a x) = Fix $ Cons (f a) x


type Tree a = Fix (TreeF a)
data TreeF a f = Leaf a | Tree a f f deriving (Show)

instance Functor (TreeF a) where
  fmap f (Leaf a) = Leaf a
  fmap f (Tree a b c) = Tree a (f b) (f c)

depth :: Tree a -> Int
depth = cata phi where
  phi (Leaf _)     = 0
  phi (Tree _ l r) = 1 + max l r


example1 :: Int
example1 = int (plus (nat 125) (nat 25))
-- 150

Or for example an interpreter for a small expression language that depends on a scoping dictionary.

{-# LANGUAGE GADTs #-}
{-# LANGUAGE DeriveFunctor #-}
{-# LANGUAGE StandaloneDeriving #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE UndecidableInstances #-}

import Control.Applicative
import qualified Data.Map as M

type Algebra f a = f a -> a
type Coalgebra f a = a -> f a

newtype Fix f = Fix { unFix :: f (Fix f) }

cata :: Functor f => Algebra f a -> Fix f -> a
cata alg = alg . fmap (cata alg) . unFix

ana :: Functor f => Coalgebra f a -> a -> Fix f
ana coalg = Fix . fmap (ana coalg) . coalg

hylo :: Functor f => Algebra f b -> Coalgebra f a -> a -> b
hylo f g = cata f . ana g

type Id = String
type Env = M.Map Id Int

type Expr = Fix ExprF
data ExprF a
  = Lit Int
  | Var Id
  | Add a a
  | Mul a a
  deriving (Show, Eq, Ord, Functor)

deriving instance Eq (f (Fix f)) => Eq (Fix f)
deriving instance Ord (f (Fix f)) => Ord (Fix f)
deriving instance Show (f (Fix f)) => Show (Fix f)

eval :: M.Map Id Int -> Fix ExprF -> Maybe Int
eval env = cata phi where
  phi ex = case ex of
    Lit c   -> pure c
    Var i   -> M.lookup i env
    Add x y -> liftA2 (+) x y
    Mul x y -> liftA2 (*) x y

expr :: Expr
expr = Fix (Mul n (Fix (Add x y)))
  where
    n = Fix (Lit 10)
    x = Fix (Var "x")
    y = Fix (Var "y")

env :: M.Map Id Int
env = M.fromList [("x", 1), ("y", 2)]

compose :: (f (Fix f) -> c) -> (a -> Fix f) -> a -> c
compose x y = x . unFix . y

example :: Maybe Int
example = eval env expr
-- Just 30

What's especially nice about this approach is how naturally catamorphisms compose into efficient composite transformations.

compose :: Functor f => (f (Fix f) -> c) -> (a -> Fix f) -> a -> c
compose f g = f . unFix . g

recursion-schemes

{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE DeriveFunctor #-}

import Data.Functor.Foldable

type Var = String

data Exp
  = Var Var
  | App Exp Exp
  | Lam [Var] Exp
  deriving Show

data ExpF a
  = VarF Var
  | AppF a a
  | LamF [Var] a
  deriving Functor

type instance Base Exp = ExpF

instance Foldable Exp where
  project (Var a)     = VarF a
  project (App a b)   = AppF a b
  project (Lam a b)   = LamF a b

instance Unfoldable Exp where
  embed (VarF a)      = Var a
  embed (AppF a b)    = App a b
  embed (LamF a b)    = Lam a b

fvs :: Exp -> [Var]
fvs = cata phi
  where phi (VarF a)    = [a]
        phi (AppF a b)  = a ++ b
        phi (LamF a b) = foldr (filter . (/=)) a b

See:

Testing

Contrary to a lot of misinformation, unit testing in Haskell is quite common and robust. Although generally speaking unit tests tend to be of less importance in Haskell since the type system makes an enormous amount of invalid programs complete inexpressible by construction. Unit tests tend to be written later in the development lifecycle and generally tend to be about the core logic of the program and not the intermediate plumbing.

A prominent school of thought on Haskell library design tends to favor constructing programs built around strong equation laws which guarantee strong invariants about program behavior under composition. Many of the testing tools are built around this style of design.

QuickCheck

Probably the most famous Haskell library, QuickCheck is a testing framework for generating large random tests for arbitrary functions automatically based on the types of their arguments.

quickCheck :: Testable prop => prop -> IO ()
(==>) :: Testable prop => Bool -> prop -> Property
forAll :: (Show a, Testable prop) => Gen a -> (a -> prop) -> Property
choose :: Random a => (a, a) -> Gen a
import Test.QuickCheck

qsort :: [Int] -> [Int]
qsort []     = []
qsort (x:xs) = qsort lhs ++ [x] ++ qsort rhs
    where lhs = filter  (< x) xs
          rhs = filter (>= x) xs

prop_maximum ::  [Int] -> Property
prop_maximum xs = not (null xs) ==>
                  last (qsort xs) == maximum xs

main :: IO ()
main = quickCheck prop_maximum
$ runhaskell qcheck.hs
*** Failed! Falsifiable (after 3 tests and 4 shrinks):    
[0]
[1]

$ runhaskell qcheck.hs
+++ OK, passed 1000 tests.

The test data generator can be extended with custom types and refined with predicates that restrict the domain of cases to test.

import Test.QuickCheck

data Color = Red | Green | Blue deriving Show

instance Arbitrary Color where
  arbitrary = do
    n <- choose (0,2) :: Gen Int
    return $ case n of
      0 -> Red
      1 -> Green
      2 -> Blue

example1 :: IO [Color]
example1 = sample' arbitrary
-- [Red,Green,Red,Blue,Red,Red,Red,Blue,Green,Red,Red]

See: QuickCheck: An Automatic Testing Tool for Haskell

SmallCheck

Like QuickCheck, SmallCheck is a property testing system but instead of producing random arbitrary test data it instead enumerates a deterministic series of test data to a fixed depth.

smallCheck :: Testable IO a => Depth -> a -> IO ()
list :: Depth -> Series Identity a -> [a]
sample' :: Gen a -> IO [a]
λ: list 3 series :: [Int]
[0,1,-1,2,-2,3,-3]

λ: list 3 series :: [Double]
[0.0,1.0,-1.0,2.0,0.5,-2.0,4.0,0.25,-0.5,-4.0,-0.25]

λ: list 3 series :: [(Int, String)]
[(0,""),(1,""),(0,"a"),(-1,""),(0,"b"),(1,"a"),(2,""),(1,"b"),(-1,"a"),(-2,""),(-1,"b"),(2,"a"),(-2,"a"),(2,"b"),(-2,"b")]

It is useful to generate test cases over all possible inputs of a program up to some depth.

import Test.SmallCheck

distrib :: Int -> Int -> Int -> Bool
distrib a b c = a * (b + c) == a * b + a * c

cauchy :: [Double] -> [Double] -> Bool
cauchy xs ys = (abs (dot xs ys))^2 <= (dot xs xs) * (dot ys ys)

failure :: [Double] -> [Double] -> Bool
failure xs ys = abs (dot xs ys) <= (dot xs xs) * (dot ys ys)

dot :: Num a => [a] -> [a] -> a
dot xs ys = sum (zipWith (*) xs ys)

main :: IO ()
main = do
  putStrLn "Testing distributivity..."
  smallCheck 25 distrib

  putStrLn "Testing Cauchy-Schwarz..."
  smallCheck 4 cauchy

  putStrLn "Testing invalid Cauchy-Schwarz..."
  smallCheck 4 failure
$ runhaskell smallcheck.hs
Testing distributivity...
Completed 132651 tests without failure.

Testing Cauchy-Schwarz...
Completed 27556 tests without failure.

Testing invalid Cauchy-Schwarz...
Failed test no. 349.
there exist [1.0] [0.5] such that
  condition is false

Just like for QuickCheck we can implement series instances for our custom datatypes. For example there is no default instance for Vector, so let's implement one:

{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE MultiParamTypeClasses #-}

import Test.SmallCheck
import Test.SmallCheck.Series
import Control.Applicative

import qualified Data.Vector as V

dot :: Num a => V.Vector a -> V.Vector a -> a
dot xs ys = V.sum (V.zipWith (*) xs ys)

cauchy :: V.Vector Double -> V.Vector Double -> Bool
cauchy xs ys = (abs (dot xs ys))^2 <= (dot xs xs) * (dot ys ys)

instance (Serial m a, Monad m) => Serial m (V.Vector a) where
  series = V.fromList <$> series

main :: IO ()
main = smallCheck 4 cauchy

SmallCheck can also use Generics to derive Serial instances, for example to enumerate all trees of a certain depth we might use:

{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE MultiParamTypeClasses #-}
{-# LANGUAGE DeriveGeneric #-}

import GHC.Generics
import Test.SmallCheck.Series

data Tree a = Null | Fork (Tree a) a (Tree a)
    deriving (Show, Generic)

instance Serial m a => Serial m (Tree a)

example :: [Tree ()]
example = list 3 series

main = print example

QuickSpec

Using the QuickCheck arbitrary machinery we can also rather remarkably enumerate a large number of combinations of functions to try and deduce algebraic laws from trying out inputs for small cases.

Of course the fundamental limitation of this approach is that a function may not exhibit any interesting properties for small cases or for simple function compositions. So in general case this approach won't work, but practically it still quite useful.

{-# LANGUAGE TypeOperators #-}
{-# LANGUAGE ConstraintKinds #-}
{-# LANGUAGE ScopedTypeVariables #-}

import Data.List
import Data.Typeable

import Test.QuickSpec hiding (lists, bools, arith)
import Test.QuickCheck

type Var k a = (Typeable a, Arbitrary a, CoArbitrary a, k a)

listCons :: forall a. Var Ord a => a -> Sig
listCons a = background
  [
    "[]"      `fun0` ([]      :: [a]),
    ":"       `fun2` ((:)     :: a -> [a] -> [a])
  ]

lists :: forall a. Var Ord a => a -> [Sig]
lists a =
  [
    -- Names to print arbitrary variables
    funs',
    funvars',
    vars',

    -- Ambient definitions
    listCons a,

    -- Expressions to deduce properties of
    "sort"     `fun1` (sort    :: [a] -> [a]),
    "map"      `fun2` (map     :: (a -> a) -> [a] -> [a]),
    "id"       `fun1` (id      :: [a] -> [a]),
    "reverse"  `fun1` (reverse :: [a] -> [a]),
    "minimum"  `fun1` (minimum :: [a] -> a),
    "length"   `fun1` (length  :: [a] -> Int),
    "++"       `fun2` ((++)    :: [a] -> [a] -> [a])
  ]

  where
    funs'    = funs (undefined :: a)
    funvars' = vars ["f", "g", "h"] (undefined :: a -> a)
    vars'    = ["xs", "ys", "zs"] `vars` (undefined :: [a])


tvar :: A
tvar = undefined

main :: IO ()
main = quickSpec (lists tvar)

Running this we rather see it is able to deduce most of the laws for list functions.

$ runhaskell src/quickspec.hs                                                                     
== API ==
-- functions --
map :: (A -> A) -> [A] -> [A]
minimum :: [A] -> A
(++) :: [A] -> [A] -> [A]
length :: [A] -> Int
sort, id, reverse :: [A] -> [A]

-- background functions --
id :: A -> A
(:) :: A -> [A] -> [A]
(.) :: (A -> A) -> (A -> A) -> A -> A
[] :: [A]

-- variables --
f, g, h :: A -> A
xs, ys, zs :: [A]

-- the following types are using non-standard equality --
A -> A

-- WARNING: there are no variables of the following types; consider adding some --
A

== Testing ==
Depth 1: 12 terms, 4 tests, 24 evaluations, 12 classes, 0 raw equations.
Depth 2: 80 terms, 500 tests, 18673 evaluations, 52 classes, 28 raw equations.
Depth 3: 1553 terms, 500 tests, 255056 evaluations, 1234 classes, 319 raw equations.
319 raw equations; 1234 terms in universe.

== Equations about map ==
  1: map f [] == []
  2: map id xs == xs
  3: map (f.g) xs == map f (map g xs)

== Equations about minimum ==
  4: minimum [] == undefined

== Equations about (++) ==
  5: xs++[] == xs
  6: []++xs == xs
  7: (xs++ys)++zs == xs++(ys++zs)

== Equations about sort ==
  8: sort [] == []
  9: sort (sort xs) == sort xs

== Equations about id ==
 10: id xs == xs

== Equations about reverse ==
 11: reverse [] == []
 12: reverse (reverse xs) == xs

== Equations about several functions ==
 13: minimum (xs++ys) == minimum (ys++xs)
 14: length (map f xs) == length xs
 15: length (xs++ys) == length (ys++xs)
 16: sort (xs++ys) == sort (ys++xs)
 17: map f (reverse xs) == reverse (map f xs)
 18: minimum (sort xs) == minimum xs
 19: minimum (reverse xs) == minimum xs
 20: minimum (xs++xs) == minimum xs
 21: length (sort xs) == length xs
 22: length (reverse xs) == length xs
 23: sort (reverse xs) == sort xs
 24: map f xs++map f ys == map f (xs++ys)
 25: reverse xs++reverse ys == reverse (ys++xs)

Not bad for mechcanical search!

Criterion

Criterion is a statistically aware benchmarking tool.

whnf :: (a -> b) -> a -> Pure
nf :: NFData b => (a -> b) -> a -> Pure
nfIO :: NFData a => IO a -> IO ()
bench :: Benchmarkable b => String -> b -> Benchmark
import Criterion.Main
import Criterion.Config

-- Naive recursion for fibonacci numbers.
fib1 :: Int -> Int
fib1 0 = 0
fib1 1 = 1
fib1 n = fib1 (n-1) + fib1 (n-2)

-- Use the De Moivre closed form for fibonacci numbers.
fib2 :: Int -> Int
fib2 x = truncate $ ( 1 / sqrt 5 ) * ( phi ^ x - psi ^ x )
  where
      phi = ( 1 + sqrt 5 ) / 2
      psi = ( 1 - sqrt 5 ) / 2

suite :: [Benchmark]
suite = [
    bgroup "naive" [
      bench "fib 10" $ whnf fib1 5
    , bench "fib 20" $ whnf fib1 10
    ],
    bgroup "de moivre" [
      bench "fib 10" $ whnf fib2 5
    , bench "fib 20" $ whnf fib2 10
    ]
  ]

main :: IO ()
main = defaultMain suite
$ runhaskell criterion.hs
warming up
estimating clock resolution...
mean is 2.349801 us (320001 iterations)
found 1788 outliers among 319999 samples (0.6%)
  1373 (0.4%) high severe
estimating cost of a clock call...
mean is 65.52118 ns (23 iterations)
found 1 outliers among 23 samples (4.3%)
  1 (4.3%) high severe

benchmarking naive/fib 10
mean: 9.903067 us, lb 9.885143 us, ub 9.924404 us, ci 0.950
std dev: 100.4508 ns, lb 85.04638 ns, ub 123.1707 ns, ci 0.950

benchmarking naive/fib 20
mean: 120.7269 us, lb 120.5470 us, ub 120.9459 us, ci 0.950
std dev: 1.014556 us, lb 858.6037 ns, ub 1.296920 us, ci 0.950

benchmarking de moivre/fib 10
mean: 7.699219 us, lb 7.671107 us, ub 7.802116 us, ci 0.950
std dev: 247.3021 ns, lb 61.66586 ns, ub 572.1260 ns, ci 0.950
found 4 outliers among 100 samples (4.0%)
  2 (2.0%) high mild
  2 (2.0%) high severe
variance introduced by outliers: 27.726%
variance is moderately inflated by outliers

benchmarking de moivre/fib 20
mean: 8.082639 us, lb 8.018560 us, ub 8.350159 us, ci 0.950
std dev: 595.2161 ns, lb 77.46251 ns, ub 1.408784 us, ci 0.950
found 8 outliers among 100 samples (8.0%)
  4 (4.0%) high mild
  4 (4.0%) high severe
variance introduced by outliers: 67.628%
variance is severely inflated by outliers

Tasty

Tasty combines all of the testing frameworks into a common API for forming runnable batches of tests and collecting the results.

import Test.Tasty
import Test.Tasty.HUnit
import Test.Tasty.QuickCheck
import qualified Test.Tasty.SmallCheck as SC

arith :: Integer -> Integer -> Property
arith x y = (x > 0) && (y > 0) ==> (x+y)^2 > x^2 + y^2

negation :: Integer -> Bool
negation x = abs (x^2) >= x

suite :: TestTree
suite = testGroup "Test Suite" [
    testGroup "Units"
      [ testCase "Equality" $ True @=? True
      , testCase "Assertion" $ assert $ (length [1,2,3]) == 3
      ],

    testGroup "QuickCheck tests"
      [ testProperty "Quickcheck test" arith
      ],

    testGroup "SmallCheck tests"
      [ SC.testProperty "Negation" negation
      ]
  ]

main :: IO ()
main = defaultMain suite
$ runhaskell TestSuite.hs
Unit tests
  Units
    Equality:        OK
    Assertion:       OK
  QuickCheck tests
    Quickcheck test: OK
      +++ OK, passed 100 tests.
  SmallCheck tests
    Negation:        OK
      11 tests completed

Type Families

MultiParam Typeclasses

Resolution of vanilla Haskell 98 typeclasses proceeds via very simple context reduction that minimizes interdependency between predicates, resolves superclasses, and reduces the types to head normal form. For example:

(Eq [a], Ord [a]) => [a]
==> Ord a => [a]

If a single parameter typeclass expresses a property of a type ( i.e. it's in a class or not in class ) then a multiparamater typeclass expresses relationships between types. For example whether if we wanted to express the relation a type can be converted to another type we might use a class like:

{-# LANGUAGE MultiParamTypeClasses #-}

import Data.Char

class Convertible a b where
  convert :: a -> b

instance Convertible Int Integer where
  convert = toInteger

instance Convertible Int Char where
  convert = chr

instance Convertible Char Int where
  convert = ord

Of course now our instances for Convertible Int are not unique anymore, so there no longer exists a nice procedure for determining the inferred type of b from just a. To remedy this let's add a functional dependency a -> b, which says tells GHC that an instance a uniquely determines the instance that b can be. So we'll see that our two instances relating Int to both Integer and Char conflict.

{-# LANGUAGE MultiParamTypeClasses #-}
{-# LANGUAGE FunctionalDependencies #-}


import Data.Char

class Convertible a b | a -> b where
  convert :: a -> b

instance Convertible Int Char where
  convert = chr

instance Convertible Char Int where
  convert = ord
Functional dependencies conflict between instance declarations:
  instance Convertible Int Integer
  instance Convertible Int Char

Now there's a simpler procedure for determining instances uniquely and multiparameter typeclasses become more usable and inferable again.

λ: convert (42 :: Int)
'42'
λ: convert '*'
42

Now let's make things not so simple. Turning on UndecidableInstances loosens the constraint on context reduction can only allow constraints of the class to become structural smaller than it's head. As a result implicit computation can now occur within in the type class instance search. Combined with a type-level representation of Peano numbers we find that we can encode basic arithmetic at the type-level.

{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE MultiParamTypeClasses #-}
{-# LANGUAGE FunctionalDependencies #-}
{-# LANGUAGE UndecidableInstances #-}

data Z
data S n

type Zero  = Z
type One   = S Zero
type Two   = S One
type Three = S Two
type Four  = S Three

zero :: Zero
zero = undefined

one :: One
one = undefined

two :: Two
two = undefined

three :: Three
three = undefined

four :: Four
four = undefined

class Eval a where
  eval :: a -> Int

instance Eval Zero where
  eval _ = 0

instance Eval n => Eval (S n) where
  eval m = 1 + eval (prev m)

class Pred a b | a -> b where
  prev :: a -> b

instance Pred Zero Zero where
  prev = undefined

instance Pred (S n) n where
  prev = undefined

class Add a b c | a b -> c where
  add :: a -> b -> c

instance Add Zero a a where
  add = undefined

instance Add a b c => Add (S a) b (S c) where
  add = undefined

f :: Three
f = add one two

g :: S (S (S (S Z)))
g = add two two

h :: Int
h = eval (add three four)

If the typeclass contexts look similar to Prolog you're not wrong, if one reads the contexts qualifier (=>) backwards as backwards turnstiles :- then it's precisely the same equations.

add(0, A, A).
add(s(A), B, s(C)) :- add(A, B, C).

pred(0, 0).
pred(S(A), A).

This is kind of abusing typeclasses and if used carelessly it can fail to terminate or overflow at compile-time. UndecidableInstances shouldn't be turned on without careful forethought about what it implies.

<interactive>:1:1:
    Context reduction stack overflow; size = 201

Type Families

Type families allows us to write functions in the type domain which take types as arguments which can yield either types or values indexed on their arguments which are evaluated at compile-time in during typechecking. Type families come in two varieties: data families and type synonym families.

  • type familes are named function on types
  • data familes are type-indexed data types

First let's look at type synonym families, there are two equivalent syntactic ways of constructing them. Either as associated type families declared within a typeclass or as standalone declarations at the toplevel. The following forms are semantically equivalent, although the unassociated form is strictly more general:

-- (1) Unassociated form
type family Rep a
type instance Rep Int = Char
type instance Rep Char = Int

class Convertible a where
  convert :: a -> Rep a

instance Convertible Int where
  convert = chr

instance Convertible Char where
  convert = ord



-- (2) Associated form
class Convertible a where
  type Rep a
  convert :: a -> Rep a

instance Convertible Int where
  type Rep Int = Char
  convert = chr

instance Convertible Char where
  type Rep Char = Int
  convert = ord

Using the same example we used for multiparamater + functional dependencies illustration we see that there is a direct translation between the type family approach and functional dependencies. These two approaches have the same expressive power.

An associated type family can be queried using the :kind! command in GHCi.

λ: :kind! Rep Int
Rep Int :: *
= Char
λ: :kind! Rep Char
Rep Char :: *
= Int

Data families on the other hand allow us to create new type parameterized data constructors. Normally we can only define typeclasses functions whose behavior results in a uniform result which is purely a result of the typeclasses arguments. With data families we can allow specialized behavior indexed on the type.

For example if we wanted to create more complicated vector structures ( bit-masked vectors, vectors of tuples, ... ) that exposed a uniform API but internally handled the differences in their data layout we can use data families to accomplish this:

{-# LANGUAGE TypeFamilies #-}

import qualified Data.Vector.Unboxed as V

data family Array a
data instance Array Int       = IArray (V.Vector Int)
data instance Array Bool      = BArray (V.Vector Bool)
data instance Array (a,b)     = PArray (Array a) (Array b)
data instance Array (Maybe a) = MArray (V.Vector Bool) (Array a)

class IArray a where
  index :: Array a -> Int -> a

instance IArray Int where
  index (IArray xs) i = xs V.! i

instance IArray Bool where
  index (BArray xs) i = xs V.! i

-- Vector of pairs
instance (IArray a, IArray b) => IArray (a, b) where
  index (PArray xs ys) i = (index xs i, index ys i)

-- Vector of missing values
instance (IArray a) => IArray (Maybe a) where
  index (MArray bm xs) i =
    case bm V.! i of
      True  -> Nothing
      False -> Just $ index xs i

Injectivity

The type level functions defined by type-families are not neccessarily injective, the function may map two disctinct input types to the same uutput type. This differs from the behavior of type constructors ( which are also type-level functions ) which are injective.

For example for the constructor Maybe, Maybe t1 = Maybe t2 implies that t1 = t2.

data Maybe a = Nothing | Just a
-- Maybe a ~ Maybe b  implies  a ~ b

type instance F Int = Bool
type instance F Char = Bool

-- F a ~ F b does not imply  a ~ b, in general

Monotraversable

Using type families, mono-traversable generalizes the notion of Functor, Foldable, and Traversable to include both monomorphic and polymorphic types.

omap :: MonoFunctor mono => (Element mono -> Element mono) -> mono -> mono

otraverse :: (Applicative f, MonoTraversable mono)
          => (Element mono -> f (Element mono)) -> mono -> f mono

ofoldMap :: (Monoid m, MonoFoldable mono)
         => (Element mono -> m) -> mono -> m
ofoldl' :: MonoFoldable mono
        => (a -> Element mono -> a) -> a -> mono -> a
ofoldr :: MonoFoldable mono
        => (Element mono -> b -> b) -> b -> mono -> b

For example the text type normally does not admit either any of these type-classes since, but now we can write down the instances that model the interface of Foldable and Traversable.

{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE OverloadedStrings #-}

import Data.Text
import Data.Char
import Data.Monoid
import Data.MonoTraversable
import Control.Applicative

bs :: Text
bs = "Hello Haskell."

shift :: Text
shift = omap (chr . (+1) . ord) bs
-- "Ifmmp!Ibtlfmm/"

backwards :: [Char]
backwards = ofoldl' (flip (:)) "" bs
-- ".lleksaH olleH"


data MyMonoType = MNil | MCons Int MyMonoType deriving Show

type instance Element MyMonoType = Int

instance MonoFunctor MyMonoType where
  omap f MNil = MNil
  omap f (MCons x xs) = f x `MCons` omap f xs

instance MonoFoldable MyMonoType where
  ofoldMap f = ofoldr (mappend . f) mempty
  ofoldr       = mfoldr
  ofoldl'      = mfoldl'
  ofoldr1Ex f  = ofoldr1Ex f . mtoList
  ofoldl1Ex' f = ofoldl1Ex' f . mtoList

instance MonoTraversable MyMonoType where
  omapM f xs = mapM f (mtoList xs) >>= return . mfromList
  otraverse f = ofoldr acons (pure MNil)
    where acons x ys = MCons <$> f x <*> ys

mtoList :: MyMonoType -> [Int]
mtoList (MNil) = []
mtoList (MCons x xs) = x : (mtoList xs)

mfromList :: [Int] -> MyMonoType
mfromList [] = MNil
mfromList (x:xs) = MCons x (mfromList xs)

mfoldr :: (Int -> a -> a) -> a -> MyMonoType -> a
mfoldr f z MNil =  z
mfoldr f z (MCons x xs) =  f x (mfoldr f z xs)

mfoldl' :: (a -> Int -> a) -> a -> MyMonoType -> a
mfoldl' f z MNil = z
mfoldl' f z (MCons x xs) = let z' = z `f` x
                           in seq z' $ mfoldl' f z' xs

ex1 :: Int
ex1 = mfoldl' (+) 0 (mfromList [1..25])

ex2 :: MyMonoType
ex2 = omap (+1) (mfromList [1..25])

See: From Semigroups to Monads

NonEmpty

Rather than having degenerate (and often partial) cases of many of the Prelude functions to accommodate the null case of lists, it is sometimes preferable to statically enforce empty lists from even being constructed as an inhabitant of a type.

infixr 5 :|, <|
data NonEmpty a = a :| [a]

head :: NonEmpty a -> a
toList :: NonEmpty a -> [a]
fromList :: [a] -> NonEmpty a
head :: NonEmpty a -> a
head ~(a :| _) = a
import Data.List.NonEmpty
import Prelude hiding (head, tail, foldl1)
import Data.Foldable (foldl1)

a :: NonEmpty Integer
a = fromList [1,2,3]
-- 1 :| [2,3]

b :: NonEmpty Integer
b = 1 :| [2,3]
-- 1 :| [2,3]

c :: NonEmpty Integer
c = fromList []
-- *** Exception: NonEmpty.fromList: empty list

d :: Integer
d = foldl1 (+) $ fromList [1..100]
-- 5050

In GHC 7.8 -XOverloadedLists can be used to avoid the extraneous fromList and toList conversions.

Manual Proofs

One of most deep results in computer science, the Curry–Howard correspondence, is the relation that logical propositions can be modeled by types and instantiating those types constitute proofs of these propositions. Programs are proofs and proofs are programs.

Types Logic
A proposition
a : A proof
B(x) predicate
Void
Unit
A + B A ∧ B
A × B A ∨ B
A -> B A ⇒ B

In dependently typed languages we can exploit this result to it's full extent, in Haskell we don't have the strength that dependent types provide but can still prove trivial results. For example, now we can model a type level function for addition and provide a small proof that zero is an additive identity.

P 0                   [ base step ]
n. P n   P (1+n)    [ inductive step ]
-------------------
n. P(n)
Axiom 1: a + 0 = a
Axiom 2: a + suc b = suc (a + b)

  0 + suc a
= suc (0 + a)  [by Axiom 2]
= suc a        [Induction hypothesis]
∎

Translated into Haskell our axioms are simply are type definitions and recursing over the inductive datatype constitutes the inductive step of our our proof.

{-# LANGUAGE GADTs #-}
{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE ExplicitForAll #-}
{-# LANGUAGE TypeOperators #-}

data Z
data S n

data SNat n where
  Zero :: SNat Z
  Succ :: SNat n -> SNat (S n)

data Eql a b where
  Refl :: Eql a a

type family Add m n
type instance Add Z n = n
type instance Add (S m) n = S (Add m n)

add :: SNat n -> SNat m -> SNat (Add n m)
add Zero     m = m
add (Succ n) m = Succ (add n m)

cong :: Eql a b -> Eql (f a) (f b)
cong Refl = Refl

-- ∀n. 0 + suc n = suc n
plus_suc :: forall n.  SNat n
         -> Eql (Add Z (S n)) (S n)
plus_suc Zero = Refl
plus_suc (Succ n) = cong (plus_suc n)

-- ∀n. 0 + n = n
plus_zero :: forall n. SNat n
         -> Eql (Add Z n) n
plus_zero Zero = Refl
plus_zero (Succ n) = cong (plus_zero n)

Using the TypeOperators extension we can also use infix notation at the type-level.

data a :=: b where
  Refl :: a :=: a

cong :: a :=: b -> (f a) :=: (f b)
cong Refl = Refl

type family (n :: Nat) :+ (m :: Nat) :: Nat
type instance Zero     :+ m = m
type instance (Succ n) :+ m = Succ (n :+ m)

plus_suc :: forall n m. SNat n -> SNat m -> (n :+ (S m)) :=: (S (n :+ m))
plus_suc Zero m = Refl
plus_suc (Succ n) m = cong (plus_suc n m)

Constraint Kinds

GHC's implementation also exposes the predicates that bound quantifiers in Haskell as types themselves, with the -XConstraintKinds extension enabled. Using this extension we work with constraints as first class types.

Num :: * -> Constraint
Odd :: * -> Constraint
type T1 a = (Num a, Ord a)

The empty constraint set is indicated by () :: Constraint.

For a contrived example if we wanted to create a generic Sized class that carried with it constraints on the elements of the container in question we could achieve this quite simply using type families.

{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE ConstraintKinds #-}

import GHC.Exts (Constraint)
import Data.Hashable
import Data.HashSet

type family Con a :: Constraint
type instance Con [a] = (Ord a, Eq a)
type instance Con (HashSet a) = (Hashable a)

class Sized a where
  gsize :: Con a => a -> Int

instance Sized [a] where
  gsize = length

instance Sized (HashSet a) where
  gsize = size

One use-case of this is to capture the typeclass dictionary constrained by a function and reify it as a value.

{-# LANGUAGE GADTs #-}
{-# LANGUAGE ConstraintKinds #-}
{-# LANGUAGE KindSignatures #-}

import GHC.Exts (Constraint)

data Dict :: Constraint -> * where
  Dict :: (c) => Dict c

dShow :: Dict (Show a) -> a -> String
dShow Dict x = show x

dEqNum :: Dict (Eq a, Num a) -> a -> Bool
dEqNum Dict x = x == 0


fShow :: String
fShow = dShow Dict 10

fEqual :: Bool
fEqual = dEqNum Dict 0

Promotion

Kind Polymorphism

The regular value level function which takes a function and applies it to an argument is universally generalized over in the usual Hindley-Milner way.

app :: forall a b. (a -> b) -> a -> b
app f a = f a

But when we do the same thing at the type-level we see we loose information about the polymorphism of the constructor applied.

-- TApp :: (* -> *) -> * -> *
data TApp f a = MkTApp (f a)

Turning on -XPolyKinds allows polymorphic variables at the kind level as well.

-- Default:   (* -> *) -> * -> *
-- PolyKinds: (k -> *) -> k -> *
data TApp f a = MkTApp (f a)

-- Default:   ((* -> *) -> (* -> *)) -> (* -> *)
-- PolyKinds: ((k -> *) -> (k -> *)) -> (k -> *)
data Mu f a = Roll (f (Mu f) a)

-- Default:   * -> *
-- PolyKinds: k -> *
data Proxy a = Proxy

Using the polykinded Proxy type allows us to write down type class functions which over constructors of arbitrary kind arity.

{-# LANGUAGE PolyKinds #-}
{-# LANGUAGE GADTs #-}
{-# LANGUAGE KindSignatures #-}

data Proxy a = Proxy
data Rep = Rep

class PolyClass a where
  foo :: Proxy a -> Rep
  foo = const Rep

-- () :: *
-- [] :: * -> *
-- Either :: * -> * -> *

instance PolyClass ()
instance PolyClass []
instance PolyClass Either

AnyK

λ: import GHC.Prim
λ: :kind AnyK
AnyK :: BOX
λ: :kind Constraint
Constraint :: BOX

Data Kinds

The -XDataKinds extension allows us to use refer to constructors at the value level and the type level. Consider a simple sum type:

data S a b = L a | R b

-- S :: * -> * -> *
-- L :: a -> S a b
-- R :: b -> S a b

With the extension enabled we see that we our type constructors are now automatically promoted so that L or R can be viewed as both a data constructor of the type S or as the type L with kind S.

{-# LANGUAGE DataKinds #-}

data S a b = L a | R b

-- S :: * -> * -> *
-- L :: * -> S * *
-- R :: * -> S * *

Promoted data constructors can referred to in type signatures by prefixing them with a single quote. Also of importance is that these promoted constructors are not exported with a module by default, but type synonym instances can be created using this notation.

data Foo = Bar | Baz
type Bar = 'Bar
type Baz = 'Baz

Combining this with type families we see we can not write meaningful, meaningful type-level functions by lifting types to the kind level.

{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE DataKinds #-}

import Prelude hiding (Bool(..))

data Bool = True | False

type family Not (a :: Bool) :: Bool

type instance Not True = False
type instance Not False = True

false :: Not True ~ False => a
false = undefined

true :: Not False ~ True => a
true = undefined

-- Fails at compile time.
-- Couldn't match type 'False with 'True
invalid :: Not True ~ True => a
invalid = undefined

Vectors

Using this new structure we can create a Vec type which is parameterized by it's length as well as it's element type now that we have a kind language rich enough to encode the successor type in the kind signature of the generalized algebraic datatype.

{-# LANGUAGE GADTs #-}
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE KindSignatures #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE FlexibleContexts #-}

data Nat = Z | S Nat deriving (Eq, Show)

type Zero  = Z
type One   = S Zero
type Two   = S One
type Three = S Two
type Four  = S Three
type Five  = S Four

data Vec :: Nat -> * -> * where
  Nil :: Vec Z a
  Cons :: a -> Vec n a -> Vec (S n) a

instance Show a => Show (Vec n a) where
  show Nil         = "Nil"
  show (Cons x xs) = "Cons " ++ show x ++ " (" ++ show xs ++ ")"

class FromList n where
  fromList :: [a] -> Vec n a

instance FromList Z where
  fromList [] = Nil

instance FromList n => FromList (S n) where
  fromList (x:xs) = Cons x $ fromList xs


lengthVec :: Vec n a -> Nat
lengthVec Nil = Z
lengthVec (Cons x xs) = S (lengthVec xs)

zipVec :: Vec n a -> Vec n b -> Vec n (a,b)
zipVec Nil Nil = Nil
zipVec (Cons x xs) (Cons y ys) = Cons (x,y) (zipVec xs ys)

vec4 :: Vec Four Int
vec4 = fromList [0, 1, 2, 3]

vec5 :: Vec Five Int
vec5 = fromList [0, 1, 2, 3, 4]


example1 :: Nat
example1 = lengthVec vec4
-- S (S (S (S Z)))

example2 :: Vec Four (Int, Int)
example2 = zipVec vec4 vec4
-- Cons (0,0) (Cons (1,1) (Cons (2,2) (Cons (3,3) (Nil))))

So now if we try to zip two Vec types with the wrong shape then we get a error at compile-time about the off-by-one error.

example2 = zipVec vec4 vec5
-- Couldn't match type 'S 'Z with 'Z
-- Expected type: Vec Four Int
--   Actual type: Vec Five Int

The same technique we can use to create a container which is statically indexed by a empty or non-empty flag, such that if we try to take the head of a empty list we'll get a compile-time error, or stated equivalently we have an obligation to prove to the compiler that the argument we hand to the head function is non-empty.

{-# LANGUAGE GADTs #-}
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE KindSignatures #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE FlexibleContexts #-}

data Size = Empty | NonEmpty

data List a b where
  Nil  :: List Empty a
  Cons :: a -> List b a -> List NonEmpty a

head' :: List NonEmpty a -> a
head' (Cons x _) = x

example1 :: Int
example1 = head' (1 `Cons` (2 `Cons` Nil))

-- Cannot match type Empty with NonEmpty
example2 :: Int
example2 = head' Nil
Couldn't match type None with Many
Expected type: List Many Int
  Actual type: List None Int

See:

Typelevel Numbers

GHC's type literals can also be used in place of explicit Peano arithmetic,

GHC 7.6 is very conservative about performing reduction, GHC 7.8 is much less so and will can solve many typelevel constraints involving natural numbers but sometimes still needs a little coaxing.

{-# LANGUAGE GADTs #-}
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE KindSignatures #-}
{-# LANGUAGE TypeOperators #-}

import GHC.TypeLits

data Vec :: Nat -> * -> * where
  Nil :: Vec 0 a
  Cons :: a -> Vec n a -> Vec (1 + n) a

-- GHC 7.6 will not reduce
-- vec3 :: Vec (1 + (1 + (1 + 0))) Int

vec3 :: Vec 3 Int
vec3 = 0 `Cons` (1 `Cons` (2 `Cons` Nil))
{-# LANGUAGE GADTs #-}
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE KindSignatures #-}
{-# LANGUAGE TypeOperators #-}
{-# LANGUAGE FlexibleContexts #-}

import GHC.TypeLits
import Data.Type.Equality

data Foo :: Nat -> * where
  Small    :: (n <= 2)  => Foo n
  Big      :: (3 <= n) => Foo n

  Empty    :: ((n == 0) ~ True) => Foo n
  NonEmpty :: ((n == 0) ~ False) => Foo n

big :: Foo 10
big = Big

small :: Foo 2
small = Small

empty :: Foo 0
empty = Empty

nonempty :: Foo 3
nonempty = NonEmpty

See: Type-Level Literals

Type Equality

Continuing with the theme of building more elaborate proofs in Haskell, GHC 7.8 recently shipped with the Data.Type.Equality module which provides us with an extended set of type-level operations for expressing the equality of types as values, constraints, and promoted booleans.

(~)   :: k -> k -> Constraint
(==)  :: k -> k -> Bool
(<=)  :: Nat -> Nat -> Constraint
(<=?) :: Nat -> Nat -> Bool
(+)   :: Nat -> Nat -> Nat
(-)   :: Nat -> Nat -> Nat
(*)   :: Nat -> Nat -> Nat
(^)   :: Nat -> Nat -> Nat
(:~:)     :: k -> k -> *
Refl      :: a1 :~: a1
sym       :: (a :~: b) -> b :~: a
trans     :: (a :~: b) -> (b :~: c) -> a :~: c
castWith  :: (a :~: b) -> a -> b
gcastWith :: (a :~: b) -> (a ~ b => r) -> r

With this we have a much stronger language for writing restrictions that can be checked at a compile-time, and a mechanism that will later allow us to write more advanced proofs.

{-# LANGUAGE GADTs #-}
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE TypeOperators #-}
{-# LANGUAGE ConstraintKinds #-}

import GHC.TypeLits
import Data.Type.Equality

type Not a b = ((b == a) ~ False)

restrictUnit :: Not () a => a -> a
restrictUnit = id

restrictChar :: Not Char a => a -> a
restrictChar = id

Proxy

Using kind polymorphism with phantom types allows us to express the Proxy type which is inhabited by a single constructor with no arguments but with a polykinded phantom type variable which carries an arbirary type as the value is passed around.

{-# LANGUAGE PolyKinds #-}

-- | A concrete, poly-kinded proxy type
data Proxy t = Proxy
import Data.Proxy

a :: Proxy ()
a = Proxy

b :: Proxy 3
b = Proxy

c :: Proxy "symbol"
c = Proxy

d :: Proxy Maybe
d = Proxy

e :: Proxy (Maybe ())
e = Proxy

This is provided by the tagged package in 7.6 and provided by the Prelude in 7.8.

We've seen constructors promoted using DataKinds, but just like at the value-level GHC also allows us some syntatic sugar for list and tuples instead of explicit cons'ing and pair'ing. This is enabled with the -XTypeOperators extension, which introduces list syntax and tuples of arbitrary arity at the type-level.

data HList :: [*] -> * where
  HNil  :: HList '[]
  HCons :: a -> HList t -> HList (a ': t)

data Tuple :: (*,*) -> * where
  Tuple :: a -> b -> Tuple '(a,b)

Using this we can construct all variety of composite type-level objects.

λ: :kind 1
1 :: Nat

λ: :kind "foo"
"foo" :: Symbol

λ: :kind [1,2,3]
[1,2,3] :: [Nat]

λ: :kind [Int, Bool, Char]
[Int, Bool, Char] :: [*]

λ: :kind Just [Int, Bool, Char]
Just [Int, Bool, Char] :: Maybe [*]

λ: :kind '("a", Int)
(,) Symbol *

λ: :kind [ '("a", Int), '("b", Bool) ]
[ '("a", Int), '("b", Bool) ] :: [(,) Symbol *]

Singleton Types

A singleton type is a type a single value inhabitant. Singleton types can be constructed in a variety of ways using GADTs or with data families.

data instance Sing (a :: Nat) where
  SZ :: Sing 'Z
  SS :: Sing n -> Sing ('S n)

data instance Sing (a :: Maybe k) where
  SNothing :: Sing 'Nothing
  SJust :: Sing x -> Sing ('Just x)

data instance Sing (a :: Bool) where
  STrue :: Sing True
  SFalse :: Sing False

Promoted Naturals

Value-level  Type-level         Models
-----------  ------------       -------
SZ           Sing 'Z            0
SS SZ        Sing ('S 'Z)       1
SS (SS SZ)   Sing ('S ('S 'Z))  2

Promoted Booleans

Value-level  Type-level         Models
-----------  ---------------    -------
STrue        Sing 'False        False
SFalse       Sing 'True         True

Promoted Maybe

Value-level  Type-level         Models
-----------  ---------------    -------
SJust a      Sing (SJust 'a)    Just a
SNothing     Sing Nothing       Nothing

Singleton types are an integral part of the small cottage industry of faking dependent types in Haskell, i.e. constructing types with terms impredicated upon values. Singleton types are a way of "cheating" by modeling the map between types and values as a structural property of the type.

{-# LANGUAGE GADTs #-}
{-# LANGUAGE RankNTypes #-}
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE PolyKinds #-}
{-# LANGUAGE KindSignatures #-}
{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE TypeOperators #-}
{-# LANGUAGE StandaloneDeriving #-}
{-# LANGUAGE TypeSynonymInstances #-}
{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE UndecidableInstances #-}

import Data.Proxy
import GHC.Exts (Any)
import Prelude hiding (succ)

data Nat = Z | S Nat

-- kind-indexed data family
data family Sing (a :: k)

data instance Sing (a :: Nat) where
  SZ :: Sing 'Z
  SS :: Sing n -> Sing ('S n)

data instance Sing (a :: Maybe k) where
  SNothing :: Sing 'Nothing
  SJust :: Sing x -> Sing ('Just x)

data instance Sing (a :: Bool) where
  STrue :: Sing True
  SFalse :: Sing False

data Fin (n :: Nat) where
  FZ :: Fin (S n)
  FS :: Fin n -> Fin (S n)

data Vec a n where
  Nil  :: Vec a Z
  Cons :: a -> Vec a n -> Vec a (S n)

class SingI (a :: k) where
  sing :: Sing a

instance SingI Z where
  sing = SZ

instance SingI n => SingI (S n) where
  sing = SS sing

deriving instance Show Nat
deriving instance Show (SNat a)
deriving instance Show (SBool a)
deriving instance Show (Fin a)
deriving instance Show a => Show (Vec a n)

type family (m :: Nat) :+ (n :: Nat) :: Nat where
  Z :+ n = n
  S m :+ n = S (m :+ n)

type SNat (k :: Nat) = Sing k
type SBool (k :: Bool) = Sing k
type SMaybe (b :: a) (k :: Maybe a) = Sing k

size :: Vec a n -> SNat n
size Nil         = SZ
size (Cons x xs) = SS (size xs)

forget :: SNat n -> Nat
forget SZ = Z
forget (SS n) = S (forget n)

natToInt :: Integral n => Nat -> n
natToInt Z     = 0
natToInt (S n) = natToInt n + 1

intToNat :: (Integral a, Ord a) => a -> Nat
intToNat 0 = Z
intToNat n = S $ intToNat (n - 1)

sNatToInt :: Num n => SNat x -> n
sNatToInt SZ     = 0
sNatToInt (SS n) = sNatToInt n + 1

index :: Fin n -> Vec a n -> a
index FZ (Cons x _)      = x
index (FS n) (Cons _ xs) = index n xs


test1 :: Fin (S (S (S Z)))
test1 = FS (FS FZ)

test2 :: Int
test2 = index FZ (1 `Cons` (2 `Cons` Nil))

test3 :: Sing ('Just ('S ('S Z)))
test3 = SJust (SS (SS SZ))

test4 :: Sing ('S ('S Z))
test4 = SS (SS SZ)

-- polymorphic constructor SingI
test5 :: Sing ('S ('S Z))
test5 = sing

The builtin singleton types provided in GHC.TypeLits have the useful implementation that type-level values can be reflected to the value-level and back up to the type-level, albeit under an existential.

someNatVal :: Integer -> Maybe SomeNat
someSymbolVal :: String -> SomeSymbol

natVal :: KnownNat n => proxy n -> Integer
symbolVal :: KnownSymbol n => proxy n -> String
{-# LANGUAGE PolyKinds #-}
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE TypeOperators #-}

import Data.Proxy
import GHC.TypeLits

a :: Integer
a = natVal (Proxy :: Proxy 1)
-- 1

b :: String
b = symbolVal (Proxy :: Proxy "foo")
-- "foo"

c :: Integer
c = natVal (Proxy :: Proxy (2 + 3))
-- 5

Closed Type Families

In the type families we've used so far (called open type families) there is no notion of ordering of the equations involved in the type-level function. The type family can be extended at any point in the code resolution simply proceeds sequentially through the available definitions. Closed type-families allow an alternative declaration that allows for a base case for the resolution allowing us to actually write recursive functions over types.

For example consider if we wanted to write a function which counts the arguments in the type of a function and reifies at the value-level.

{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE TypeOperators #-}
{-# LANGUAGE UndecidableInstances #-}

import Data.Proxy
import GHC.TypeLits

type family Count (f :: *) :: Nat where
  Count (a -> b) = 1 + (Count b)
  Count x = 1

type Fn1 = Int -> Int
type Fn2 = Int -> Int -> Int -> Int

fn1 :: Integer
fn1 = natVal (Proxy :: Proxy (Count Fn1))
-- 2

fn2 :: Integer
fn2 = natVal (Proxy :: Proxy (Count Fn2))
-- 4

The variety of functions we can now write down are rather remarkable, allowing us to write meaningful logic at the type level.

{-# LANGUAGE DataKinds #-}
{-# LANGUAGE PolyKinds #-}
{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE TypeOperators #-}
{-# LANGUAGE ScopedTypeVariables #-}
{-# LANGUAGE UndecidableInstances #-}

import GHC.TypeLits
import Data.Proxy
import Data.Type.Equality

-- Type-level functions over type-level lists.

type family Reverse (xs :: [k]) :: [k] where
  Reverse '[] = '[]
  Reverse xs = Rev xs '[]

type family Rev (xs :: [k]) (ys :: [k]) :: [k] where
  Rev '[] i = i
  Rev (x ': xs) i = Rev xs (x ': i)

type family Length (as :: [k]) :: Nat where
  Length '[] = 0
  Length (x ': xs) = 1 + Length xs

type family If (p :: Bool) (a :: k) (b :: k) :: k where
  If True a b = a
  If False a b = b

type family Concat (as :: [k]) (bs :: [k]) :: [k] where
  Concat a '[] = a
  Concat '[] b = b
  Concat (a ': as) bs = a ': Concat as bs

type family Map (f :: a -> b) (as :: [a]) :: [b] where
  Map f '[] = '[]
  Map f (x ': xs) = f x ': Map f xs

type family Sum (xs :: [Nat]) :: Nat where
  Sum '[] = 0
  Sum (x ': xs) = x + Sum xs

ex1 :: Reverse [1,2,3] ~ [3,2,1] => Proxy a
ex1 = Proxy

ex2 :: Length [1,2,3] ~ 3 => Proxy a
ex2 = Proxy

ex3 :: (Length [1,2,3]) ~ (Length (Reverse [1,2,3])) => Proxy a
ex3 = Proxy

-- Reflecting type level computations back to the value level.
ex4 :: Integer
ex4 = natVal (Proxy :: Proxy (Length (Concat [1,2,3] [4,5,6])))
-- 6

ex5 :: Integer
ex5 = natVal (Proxy :: Proxy (Sum [1,2,3]))
-- 6

-- Couldn't match type ‘2’ with ‘1’
ex6 :: Reverse [1,2,3] ~ [3,1,2] => Proxy a
ex6 = Proxy

The results of type family functions need not necessarily be kinded as (*) either. For example using Nat or Constraint is permitted.

type family Elem (a :: k) (bs :: [k]) :: Constraint where
  Elem a (a ': bs) = (() :: Constraint)
  Elem a (b ': bs) = a `Elem` bs

type family Sum (ns :: [Nat]) :: Nat where
  Sum '[] = 0
  Sum (n ': ns) = n + Sum ns

Kind Indexed Type Families

Just as typeclasses are normally indexed on types, classes can also be indexed on kinds with the kinds given as explicit kind signatures on type variables.

type family (a :: k) == (b :: k) :: Bool
type instance a == b = EqStar a b
type instance a == b = EqArrow a b
type instance a == b = EqBool a b

type family EqStar (a :: *) (b :: *) where
  EqStar a a = True
  EqStar a b = False

type family EqArrow (a :: k1 -> k2) (b :: k1 -> k2) where
  EqArrow a a = True
  EqArrow a b = False

type family EqBool a b where
  EqBool True  True  = True
  EqBool False False = True
  EqBool a     b     = False

type family EqList a b where
  EqList '[]        '[]        = True
  EqList (h1 ': t1) (h2 ': t2) = (h1 == h2) && (t1 == t2)
  EqList a          b          = False
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE PolyKinds #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE FunctionalDependencies #-}
{-# LANGUAGE TypeOperators #-}
{-# LANGUAGE ConstraintKinds #-}


import GHC.TypeLits
import Data.Type.Equality

data Label (l :: Symbol) = Get

class Has a l b | a l -> b where
  from :: a -> Label l -> b

data Point2D = Point2 Double Double deriving Show
data Point3D = Point3 Double Double Double deriving Show

instance Has Point2D "x" Double where
  from (Point2 x _) _ = x

instance Has Point2D "y" Double where
  from (Point2 _ y) _ = y


instance Has Point3D "x" Double where
  from (Point3 x _ _) _ = x

instance Has Point3D "y" Double where
  from (Point3 _ y _) _ = y

instance Has Point3D "z" Double where
  from (Point3 _ _ z) _ = z


infixl 6 #

(#) :: a -> (a -> b) -> b
(#) = flip ($)

_x :: Has a "x" b => a -> b
_x pnt = from pnt (Get :: Label "x")

_y :: Has a "y" b => a -> b
_y pnt = from pnt (Get :: Label "y")

_z :: Has a "z" b => a -> b
_z pnt = from pnt (Get :: Label "z")

type Point a r = (Has a "x" r, Has a "y" r)

distance :: (Point a r, Point b r, Floating r) => a -> b -> r
distance p1 p2 = sqrt (d1^2 + d2^2)
  where
    d1 = (p1 # _x) + (p1 # _y)
    d2 = (p2 # _x) + (p2 # _y)

main :: IO ()
main = do
  print $ (Point2 10 20) # _x

  -- Fails with: No instance for (Has Point2D "z" a0)
  -- print $ (Point2 10 20) # _z

  print $ (Point3 10 20 30) # _x
  print $ (Point3 10 20 30) # _z

  print $ distance (Point2 1 3) (Point2 2 7)
  print $ distance (Point2 1 3) (Point3 2 7 4)
  print $ distance (Point3 1 3 5) (Point3 2 7 3)

Since record is fundamentally no different from the tuple we can also do the same kind of construction over record field names.

{-# LANGUAGE DataKinds #-}
{-# LANGUAGE KindSignatures #-}
{-# LANGUAGE MultiParamTypeClasses #-}
{-# LANGUAGE FunctionalDependencies #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE StandaloneDeriving #-}
{-# LANGUAGE ExistentialQuantification #-}
{-# LANGUAGE ConstraintKinds #-}


import GHC.TypeLits

newtype Field (n :: Symbol) v = Field { unField :: v }
  deriving Show

data Person1 = Person1
  { _age      :: Field "age" Int
  , _name     :: Field "name" String
  }

data Person2 = Person2
  { _age'  :: Field "age" Int
  , _name' :: Field "name" String
  , _lib'  :: Field "lib" String
  }

deriving instance Show Person1
deriving instance Show Person2

data Label (l :: Symbol) = Get

class Has a l b | a l -> b where
  from :: a -> Label l -> b

instance Has Person1 "age" Int where
  from (Person1 a _) _ = unField a

instance Has Person1 "name" String where
  from (Person1 _ a) _ = unField a

instance Has Person2 "age" Int where
  from (Person2 a _ _) _ = unField a

instance Has Person2 "name" String where
  from (Person2 _ a _) _ = unField a

age :: Has a "age" b => a -> b
age pnt = from pnt (Get :: Label "age")

name :: Has a "name" b => a -> b
name pnt = from pnt (Get :: Label "name")

-- Parameterized constraint kind for "Simon-ness" of a record.
type Simon a = (Has a "name" String, Has a "age" Int)

spj :: Person1
spj = Person1 (Field 56) (Field "Simon Peyton Jones")

smarlow :: Person2
smarlow = Person2 (Field 38) (Field "Simon Marlow") (Field "rts")


catNames :: (Simon a, Simon b) => a -> b -> String
catNames a b = name a ++ name b

addAges :: (Simon a, Simon b) => a -> b -> Int
addAges a b = age a + age b


names :: String
names = name smarlow ++ "," ++ name spj
-- "Simon Marlow,Simon Peyton Jones"

ages :: Int
ages = age spj + age smarlow
-- 94

Notably this approach is mostly just all boilerplate class instantiation which could be abstracted away using TemplateHaskell or a Generic deriving.

HLists

A heterogeneous list is a cons list whose type statically encodes the ordered types of of it's values.

{-# LANGUAGE DataKinds #-}
{-# LANGUAGE GADTs #-}
{-# LANGUAGE TypeOperators #-}
{-# LANGUAGE KindSignatures #-}

infixr 5 :::

data HList (ts :: [ * ]) where
  Nil :: HList '[]
  (:::) :: t -> HList ts -> HList (t ': ts)

-- Take the head of a non-empty list with the first value as Bool type.
headBool :: HList (Bool ': xs) -> Bool
headBool hlist = case hlist of
  (a ::: _) -> a

hlength :: HList x -> Int
hlength Nil = 0
hlength (_ ::: b) = 1 + (hlength b)


tuple :: (Bool, (String, (Double, ())))
tuple = (True, ("foo", (3.14, ())))

hlist :: HList '[Bool, String , Double , ()]
hlist = True ::: "foo" ::: 3.14 ::: () ::: Nil

Of course this immediately begs the question of how to print such a list out to a string in the presence of type-heterogeneity. In this case we can use type-families combined with constraint kinds to apply the Show over the HLists parameters to generate the aggregate constraint that all types in the HList are Showable, and then derive the Show instance.

{-# LANGUAGE GADTs #-}
{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE TypeOperators #-}
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE PolyKinds #-}
{-# LANGUAGE KindSignatures #-}
{-# LANGUAGE ConstraintKinds #-}
{-# LANGUAGE UndecidableInstances #-}

import GHC.Exts (Constraint)

infixr 5 :::

data HList (ts :: [ * ]) where
  Nil :: HList '[]
  (:::) :: t -> HList ts -> HList (t ': ts)

type family Map (f :: a -> b) (xs :: [a]) :: [b]
type instance Map f '[] = '[]
type instance Map f (x ': xs) = f x ': Map f xs

type family Constraints (cs :: [Constraint]) :: Constraint
type instance Constraints '[] = ()
type instance Constraints (c ': cs) = (c, Constraints cs)

type AllHave (c :: k -> Constraint) (xs :: [k]) = Constraints (Map c xs)

showHList :: AllHave Show xs => HList xs -> [String]
showHList Nil = []
showHList (x ::: xs) = (show x) : showHList xs

instance AllHave Show xs => Show (HList xs) where
  show = show . showHList

example1 :: HList '[Bool, String , Double , ()]
example1 = True ::: "foo" ::: 3.14 ::: () ::: Nil
-- ["True","\"foo\"","3.14","()"]

Type Map

Much of this discussion of promotion begs the question whether we can create data structures at the type-level to store information at compile-time. For example a type-level association list can be used to model a map between type-level symbols and any other promotable types. Together with type-families we can write down type-level traversal and lookup functions.

{-# LANGUAGE GADTs #-}
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE PolyKinds #-}
{-# LANGUAGE RankNTypes #-}
{-# LANGUAGE TypeOperators #-}
{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE KindSignatures #-}
{-# LANGUAGE ConstraintKinds #-}
{-# LANGUAGE UndecidableInstances #-}

import GHC.TypeLits
import Data.Proxy
import Data.Type.Equality

type family If (p :: Bool) (a :: k) (b :: k) :: k where
  If True a b = a
  If False a b = b

type family Lookup (k :: a) (ls :: [(a, b)]) :: Maybe b where
  Lookup k '[] = 'Nothing
  Lookup k ('(a, b) ': xs) = If (a == k) ('Just b) (Lookup k xs)

type M = [
    '("a", 1)
  , '("b", 2)
  , '("c", 3)
  , '("d", 4)
  ]

type K = "a"
type (!!) m (k :: Symbol) a = (Lookup k m) ~ Just a

value :: Integer
value = natVal ( Proxy :: (M !! "a") a => Proxy a )

If we ask GHC to expand out the type signature we can view the explicit implementation of the type-level map lookup function.

(!!)
  :: If
       (GHC.TypeLits.EqSymbol "a" k)
       ('Just 1)
       (If
          (GHC.TypeLits.EqSymbol "b" k)
          ('Just 2)
          (If
             (GHC.TypeLits.EqSymbol "c" k)
             ('Just 3)
             (If (GHC.TypeLits.EqSymbol "d" k) ('Just 4) 'Nothing)))
     ~ 'Just v =>
     Proxy k -> Proxy v

Advanced Proofs

Now that we have the this length-indexed vector let's go write the reverse function, how hard could it be?

So we go and write down something like this:

reverseNaive :: forall n a. Vec a n -> Vec a n
reverseNaive xs = go Nil xs -- Error: n + 0 != n
  where
    go :: Vec a m -> Vec a n -> Vec a (n :+ m)
    go acc Nil = acc
    go acc (Cons x xs) = go (Cons x acc) xs -- Error: n + succ m != succ (n + m)

Running this we find that GHC is unhappy about two lines in the code:

Couldn't match type ‘n’ with ‘n :+ 'Z’
    Expected type: Vec a n
      Actual type: Vec a (n :+ 'Z)

Could not deduce ((n1 :+ 'S m) ~ 'S (n1 :+ m))
    Expected type: Vec a1 (k :+ m)
      Actual type: Vec a1 (n1 :+ 'S m)

As we unfold elements out of the vector we'll end up a doing a lot of type-level arithmetic over indices as we combine the subparts of the vector backwards, but as a consequence we find that GHC will run into some unification errors because it doesn't know about basic arithmetic properties of the natural numbers. Namely that forall n. n + 0 = 0 and forall n m. n + (1 + m) = 1 + (n + m). Which of course it really shouldn't given that we've constructed a system at the type-level which intuitively models arithmetic but GHC is just a dumb compiler, it can't automatically deduce the isomorphism between natural numbers and Peano numbers.

So at each of these call sites we now have a proof obligation to construct proof terms which rearrange the type signatures of the terms in question such that actual types in the error messages GHC gave us align with the expected values to complete the program.

Recall from our discussion of propositional equality from GADTs that we actually have such machinery to do this!

{-# LANGUAGE GADTs #-}
{-# LANGUAGE PolyKinds #-}
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE TypeOperators #-}
{-# LANGUAGE KindSignatures #-}
{-# LANGUAGE ExplicitForAll #-}

import Data.Type.Equality

data Nat = Z | S Nat

data SNat n where
  Zero :: SNat Z
  Succ :: SNat n -> SNat (S n)

data Vec :: * -> Nat -> * where
  Nil :: Vec a Z
  Cons :: a -> Vec a n -> Vec a (S n)

instance Show a => Show (Vec a n) where
  show Nil         = "Nil"
  show (Cons x xs) = "Cons " ++ show x ++ " (" ++ show xs ++ ")"

type family (m :: Nat) :+ (n :: Nat) :: Nat where
  Z :+ n = n
  S m :+ n = S (m :+ n)

-- (a ~ b) implies (f a ~ f b)
cong :: a :~: b -> f a :~: f b
cong Refl = Refl

-- (a ~ b) implies (f a) implies (f b)
subst :: a :~: b -> f a -> f b
subst Refl = id

plus_zero :: forall n. SNat n -> (n :+ Z) :~: n
plus_zero Zero = Refl
plus_zero (Succ n) = cong (plus_zero n)

plus_suc :: forall n m. SNat n -> SNat m -> (n :+ (S m)) :~: (S (n :+ m))
plus_suc Zero m = Refl
plus_suc (Succ n) m = cong (plus_suc n m)

size :: Vec a n -> SNat n
size Nil         = Zero
size (Cons _ xs) = Succ $ size xs

reverse :: forall n a. Vec a n -> Vec a n
reverse xs = subst (plus_zero (size xs)) $ go Nil xs
  where
    go :: Vec a m -> Vec a k -> Vec a (k :+ m)
    go acc Nil = acc
    go acc (Cons x xs) = subst (plus_suc (size xs) (size acc)) $ go (Cons x acc) xs

append :: Vec a n -> Vec a m -> Vec a (n :+ m)
append (Cons x xs) ys = Cons x (append xs ys)
append Nil         ys = ys

vec :: Vec Int (S (S (S Z)))
vec = 1 `Cons` (2 `Cons` (3 `Cons` Nil))

test :: Vec Int (S (S (S Z)))
test = Main.reverse vec

One might consider whether we could avoid using the singleton trick and just use type-level natural numbers, and technically this approach should be feasible although it seems that the natural number solver in GHC 7.8 can decide some properties but not the ones needed to complete the natural number proofs for the reverse functions.

{-# LANGUAGE DataKinds #-}
{-# LANGUAGE ExplicitForAll #-}
{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE TypeOperators #-}
{-# LANGUAGE UndecidableInstances #-}

import Prelude hiding (Eq)
import GHC.TypeLits
import Data.Type.Equality

type Z = 0

type family S (n :: Nat) :: Nat where
  S n = n + 1

-- Yes!
eq_zero :: Z :~: Z
eq_zero = Refl

-- Yes!
zero_plus_one :: (Z + 1) :~: (1 + Z)
zero_plus_one = Refl

-- Yes!
plus_zero :: forall n. (n + Z) :~: n
plus_zero = Refl

-- Yes!
plus_one :: forall n. (n + S Z) :~: S n
plus_one = Refl

-- No.
plus_suc :: forall n m. (n + (S m)) :~: (S (n + m))
plus_suc = Refl

Caveat should be that there might be a way to do this in GHC 7.6 that I'm not aware of. In GHC 7.10 there are some planned changes to solver that should be able to resolve these issues.

As an aside this is a direct transliteration of the equivalent proof in Agda, which is accomplished via the same method but without the song and dance to get around the lack of dependent types.

module Vector where

infixr 10 __

data: Set where
  zero : ℕ
  suc  :{-# BUILTIN NATURAL ℕ    #-}
{-# BUILTIN ZERO    zero #-}
{-# BUILTIN SUC     suc  #-}

infixl 6 _+_

_+_ :0 + n = n
suc m + n = suc (m + n)

data Vec (A : Set) : Set where
  []  : Vec A 0
  __ :  {n}  A  Vec A n  Vec A (suc n)

_++_ :  {A n m}  Vec A n  Vec A m  Vec A (n + m)
[] ++ ys = ys
(x  xs) ++ ys = x  (xs ++ ys)

infix 4 _≡_

data _≡_ {A : Set} (x : A) : A  Set where
  refl : x ≡ x

subst : {A : Set}  (P : A  Set)  {x y}  x ≡ y  P x  P y
subst P refl p = p

cong : {A B : Set} (f : A  B)  {x y : A}  x ≡ y  f x ≡ f y
cong f refl = refl

vec :  {A} (k : ℕ)  Set
vec {A} k = Vec A k

plus_zero : {n : ℕ}  n + 0 ≡ n 
plus_zero {zero}  = refl
plus_zero {suc n} = cong suc plus_zero

plus_suc : {n : ℕ}  n + (suc 0) ≡ suc n 
plus_suc {zero}  = refl
plus_suc {suc n} = cong suc (plus_suc {n})

reverse :  {A n}  Vec A n  Vec A n
reverse []       = []
reverse {A} {suc n} (x  xs) = subst vec (plus_suc {n}) (reverse xs ++ (x   []))

Generics

Haskell has several techniques for automatic generation of type classes for a variety of tasks that consist largely of boilerplate code generation such as:

  • Pretty Printing
  • Equality
  • Serialization
  • Ordering
  • Traversal

Typeable

The Typeable class be used to create runtime type information for arbitrary types.

typeOf :: Typeable a => a -> TypeRep
{-# LANGUAGE DeriveDataTypeable #-}

import Data.Typeable

data Animal = Cat | Dog deriving Typeable
data Zoo a = Zoo [a] deriving Typeable

equal :: (Typeable a, Typeable b) => a -> b -> Bool
equal a b = typeOf a == typeOf b

example1 :: TypeRep
example1 = typeOf Cat
-- Animal

example2 :: TypeRep
example2 = typeOf (Zoo [Cat, Dog])
-- Zoo Animal

example3 :: TypeRep
example3 = typeOf ((1, 6.636e-34, "foo") :: (Int, Double, String))
-- (Int,Double,[Char])

example4 :: Bool
example4 = equal False ()
-- False

Using the Typeable instance allows us to write down a type safe cast function which can safely use unsafeCast and provide a proof that the resulting type matches the input.

cast :: (Typeable a, Typeable b) => a -> Maybe b
cast x
  | typeOf x == typeOf ret = Just ret
  | otherwise = Nothing
  where
    ret = unsafeCast x

Of historical note is that writing our own Typeable classes is currently possible of GHC 7.6 but allows us to introduce dangerous behavior that can cause crashes, and shouldn't be done except by GHC itself. As of 7.8 GHC forbids hand-written Typeable instances.

See: Typeable and Data in Haskell

Dynamic

Since we have a way of querying runtime type information we can use this machinery to implement a Dynamic type. This allows us to box up any monotype into a uniform type that can be passed to any function taking a Dynamic type which can then unpack the underlying value in a type-safe way.

toDyn :: Typeable a => a -> Dynamic
fromDyn :: Typeable a => Dynamic -> a -> a
fromDynamic :: Typeable a => Dynamic -> Maybe a
cast :: (Typeable a, Typeable b) => a -> Maybe b
import Data.Dynamic
import Data.Maybe

dynamicBox :: Dynamic
dynamicBox = toDyn (6.62 :: Double)

example1 :: Maybe Int
example1 = fromDynamic dynamicBox
-- Nothing

example2 :: Maybe Double
example2 = fromDynamic dynamicBox
-- Just 6.62

example3 :: Int
example3 = fromDyn dynamicBox 0
-- 0

example4 :: Double
example4 = fromDyn dynamicBox 0.0
-- 6.62

Data

Just as Typeable let's create runtime type information where needed, the Data class allows us to reflect information about the structure of datatypes to runtime as needed.

class Typeable a => Data a where
  gfoldl  :: (forall d b. Data d => c (d -> b) -> d -> c b)
          -> (forall g. g -> c g)
          -> a
          -> c a

  gunfold :: (forall b r. Data b => c (b -> r) -> c r)
          -> (forall r. r -> c r)
          -> Constr
          -> c a

  toConstr :: a -> Constr
  dataTypeOf :: a -> DataType
  gmapQl :: (r -> r' -> r) -> r -> (forall d. Data d => d -> r') -> a -> r

The types for gfoldl and gunfold are a little intimidating ( and depend on Rank2Types ), the best way to understand is to look at some examples. First the most trivial case a simple sum type Animal would produce the follow the following code:

data Animal = Cat | Dog deriving Typeable
instance Data Animal where
  gfoldl k z Cat = z Cat
  gfoldl k z Dog = z Dog

  gunfold k z c
    = case constrIndex c of
        1 -> z Cat
        2 -> z Dog 

  toConstr Cat = cCat
  toConstr Dog = cDog

  dataTypeOf _ = tAnimal

tAnimal :: DataType
tAnimal = mkDataType "Main.Animal" [cCat, cDog]

cCat :: Constr
cCat = mkConstr tAnimal "Cat" [] Prefix

cDog :: Constr
cDog = mkConstr tAnimal "Dog" [] Prefix

For a type with non-empty containers we get something a little more interesting. Consider the list type:

instance Data a => Data [a] where
  gfoldl _ z []     = z []
  gfoldl k z (x:xs) = z (:) `k` x `k` xs

  toConstr []    = nilConstr
  toConstr (_:_) = consConstr

  gunfold k z c 
    = case constrIndex c of
        1 -> z []
        2 -> k (k (z (:)))

  dataTypeOf _ = listDataType

nilConstr :: Constr
nilConstr = mkConstr listDataType "[]" [] Prefix

consConstr :: Constr
consConstr = mkConstr listDataType "(:)" [] Infix

listDataType :: DataType
listDataType = mkDataType "Prelude.[]" [nilConstr,consConstr]

Looking at gfoldl we see the Data has an implementation of a function for us to walk an applicative over the elements of the constructor by applying a function k over each element and applying z at the spine. For example look at the instance for a 2-tuple as well:

instance (Data a, Data b) => Data (a,b) where
  gfoldl k z (a,b) = z (,) `k` a `k` b

  toConstr (_,_) = tuple2Constr

  gunfold k z c  
    = case constrIndex c of
      1 -> k (k (z (,)))

  dataTypeOf _  = tuple2DataType

tuple2Constr :: Constr
tuple2Constr = mkConstr tuple2DataType "(,)" [] Infix

tuple2DataType :: DataType
tuple2DataType = mkDataType "Prelude.(,)" [tuple2Constr]

This is pretty cool, now within the same typeclass we have a generic way to introspect any Data instance and writing logic that depends on the structure and types of its subterms. We can now write a function which allow us to traverse an arbitrary instance Data and twiddle values based on pattern matching on the runtime types. So let's write down a function over which increments a Value type for both for n-tuples and lists.

{-# LANGUAGE DeriveDataTypeable #-}

import Data.Data
import Control.Monad.Identity
import Control.Applicative

data Animal = Cat | Dog deriving (Data, Typeable)

newtype Val = Val Int deriving (Show, Data, Typeable)

incr :: Typeable a => a -> a
incr = maybe id id (cast f)
  where f (Val x) = Val (x * 100)

over :: Data a => a -> a
over x = runIdentity $ gfoldl cont base (incr x)
  where
    cont k d = k <*> (pure $ over d)
    base = pure


example1 :: Constr
example1 = toConstr Dog
-- Dog

example2 :: DataType
example2 = dataTypeOf Cat
-- DataType {tycon = "Main.Animal", datarep = AlgRep [Cat,Dog]}

example3 :: [Val]
example3 = over [Val 1, Val 2, Val 3]
-- [Val 100,Val 200,Val 300]

example4 :: (Val, Val, Val)
example4 = over (Val 1, Val 2, Val 3)
-- (Val 100,Val 200,Val 300)

We can also write generic operations to for instance count the number of parameters in a data type.

numHoles :: Data a => a -> Int
numHoles = gmapQl (+) 0 (const 1)

example1 :: Int
example1 = numHoles (1,2,3,4,5,6,7)
-- 7

example2 :: Int
example2 = numHoles (Just 3)
-- 1

This method adapts itself well to generic traversals but the types quickly become rather hairy when dealing anymore more complicated involving folds and unsafe coercions.

Generic

The most modern method of doing generic programming uses type families to achieve a better of deriving the structural properties of arbitrary type classes. Generic implements a typeclass with an associated type Rep ( Representation ) together with a pair of functions that form a 2-sided inverse ( isomorphism ) for converting to and from the associated type and the derived type in question.

class Generic a where
  type Rep a
  from :: a -> Rep a
  to :: Rep a -> a

class Datatype d where
  datatypeName :: t d f a -> String
  moduleName :: t d f a -> String

class Constructor c where
  conName :: t c f a -> String

GHC.Generics defines a set of named types for modeling the various structural properties of types in available in Haskell.

-- | Sums: encode choice between constructors
infixr 5 :+:
data (:+:) f g p = L1 (f p) | R1 (g p)

-- | Products: encode multiple arguments to constructors
infixr 6 :*:
data (:*:) f g p = f p :*: g p

-- | Tag for M1: datatype
data D
-- | Tag for M1: constructor
data C

-- | Constants, additional parameters and recursion of kind *
newtype K1 i c p = K1 { unK1 :: c }

-- | Meta-information (constructor names, etc.)
newtype M1 i c f p = M1 { unM1 :: f p }

-- | Type synonym for encoding meta-information for datatypes
type D1 = M1 D

-- | Type synonym for encoding meta-information for constructors
type C1 = M1 C

Using the deriving mechanics GHC can generate this Generic instance for us mechanically, if we were to write it by hand for a simple type it might look like this:

{-# LANGUAGE TypeOperators #-}
{-# LANGUAGE TypeFamilies #-}

import GHC.Generics

data Animal
  = Dog
  | Cat

instance Generic Animal where
  type Rep Animal = D1 T_Animal ((C1 C_Dog U1) :+: (C1 C_Cat U1))

  from Dog = M1 (L1 (M1 U1))
  from Cat = M1 (R1 (M1 U1))

  to (M1 (L1 (M1 U1))) = Dog
  to (M1 (R1 (M1 U1))) = Cat

data T_Animal
data C_Dog
data C_Cat

instance Datatype T_Animal where
  datatypeName _ = "Animal"
  moduleName _ = "Main"

instance Constructor C_Dog where
  conName _ = "Dog"

instance Constructor C_Cat where
  conName _ = "Cat"

Use kind! in GHCi we can look at the type family Rep associated with a Generic instance.

λ: :kind! Rep Animal
Rep Animal :: * -> *
= M1 D T_Animal (M1 C C_Dog U1 :+: M1 C C_Cat U1)

λ: :kind! Rep ()
Rep () :: * -> *
= M1 D GHC.Generics.D1() (M1 C GHC.Generics.C1_0() U1)

λ: :kind! Rep [()]
Rep [()] :: * -> *
= M1
    D
    GHC.Generics.D1[]
    (M1 C GHC.Generics.C1_0[] U1
     :+: M1
           C
           GHC.Generics.C1_1[]
           (M1 S NoSelector (K1 R ()) :*: M1 S NoSelector (K1 R [()])))

Now the clever bit, instead writing our generic function over the datatype we instead write it over the Rep and then reify the result using from. Some for an equivalent version of Haskell's default Eq that instead uses generic deriving we could write:

class GEq' f where
  geq' :: f a -> f a -> Bool

instance GEq' U1 where
  geq' _ _ = True

instance (GEq c) => GEq' (K1 i c) where
  geq' (K1 a) (K1 b) = geq a b

instance (GEq' a) => GEq' (M1 i c a) where
  geq' (M1 a) (M1 b) = geq' a b

-- Equality for sums.
instance (GEq' a, GEq' b) => GEq' (a :+: b) where
  geq' (L1 a) (L1 b) = geq' a b
  geq' (R1 a) (R1 b) = geq' a b
  geq' _      _      = False

-- Equality for products.
instance (GEq' a, GEq' b) => GEq' (a :*: b) where
  geq' (a1 :*: b1) (a2 :*: b2) = geq' a1 a2 && geq' b1 b2

Now to to accommodate the two methods of writing classes (generic-deriving or custom implementations) we can use DefaultSignatures extension to allow the user to leave typeclass functions blank and defer to the Generic or to define their own.

{-# LANGUAGE DefaultSignatures #-}

class GEq a where 
  geq :: a -> a -> Bool

  default geq :: (Generic a, GEq' (Rep a)) => a -> a -> Bool
  geq x y = geq' (from x) (from y)

Now anyone using our library need only derive Generic and create an empty instance of our typeclass instance without writing any boilerplate for GEq.

See:

Generic Deriving

GHC.Generics, we can use GHC to do lots of non-trivial code generation which works spectacularly well.

The hashable library allows us to derive hashing functions.

{-# LANGUAGE DeriveGeneric #-}

import GHC.Generics (Generic)
import Data.Hashable

data Color = Red | Green | Blue deriving (Generic, Show)

instance Hashable Color where

example1 :: Int
example1 = hash Red
-- 839657738087498284

example2 :: Int
example2 = hashWithSalt 0xDEADBEEF Red
-- 62679985974121021

The cereal library allows us to automatically derive a binary representation.

{-# LANGUAGE DeriveGeneric #-}

import Data.Word
import Data.ByteString
import Data.Serialize

import GHC.Generics

data Val = A [Val] | B [(Val, Val)] | C
  deriving (Generic, Show)

instance Serialize Val where

encoded :: ByteString
encoded = encode (A [B [(C, C)]])
-- "\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\SOH\SOH\NUL\NUL\NUL\NUL\NUL\NUL\NUL\SOH\STX\STX"

bytes :: [Word8]
bytes = unpack encoded
-- [0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,1,2,2]

decoded :: Either String Val
decoded = decode encoded

The aeson library allows us to derive JSON representations for JSON instances.

{-# LANGUAGE DeriveGeneric #-}
{-# LANGUAGE OverloadedStrings #-}

import Data.Aeson
import GHC.Generics

data Point = Point { _x :: Double, _y :: Double }
   deriving (Show, Generic)

instance FromJSON Point
instance ToJSON Point

example1 :: Maybe Point
example1 = decode "{\"x\":3.0,\"y\":-1.0}"

example2 = encode $ Point 123.4 20

See: A Generic Deriving Mechanism for Haskell

Uniplate

Uniplate is a generics library for writing traversals and transformation for arbitrary data structures. It is extremely useful for writing AST transformations and rewrite systems.

plate :: from -> Type from to
(|*)  :: Type (to -> from) to -> to -> Type from to
(|-)  :: Type (item -> from) to -> item -> Type from to

descend   :: Uniplate on => (on -> on) -> on -> on
transform :: Uniplate on => (on -> on) -> on -> on
rewrite   :: Uniplate on => (on -> Maybe on) -> on -> on

The descend function will apply a function to each immediate descendent of an expression and then combines them up into the parent expression.

The transform function will perform a single pass bottom-up transformation of all terms in the expression.

The rewrite function will perform a exhaustive transformation of all terms in the expression to fixed point, using Maybe to signify termination.

import Data.Generics.Uniplate.Direct

data Expr a
  = Fls
  | Tru
  | Var a
  | Not (Expr a)
  | And (Expr a) (Expr a)
  | Or  (Expr a) (Expr a)
  deriving (Show, Eq)

instance Uniplate (Expr a) where
  uniplate (Not f)     = plate Not |* f
  uniplate (And f1 f2) = plate And |* f1 |* f2
  uniplate (Or f1 f2)  = plate Or |* f1 |* f2
  uniplate x           = plate x

simplify :: Expr a -> Expr a
simplify = transform simp
 where
   simp (Not (Not f)) = f
   simp (Not Fls) = Tru
   simp (Not Tru) = Fls
   simp x = x

reduce :: Show a => Expr a -> Expr a
reduce = rewrite cnf
  where
    -- double negation
    cnf (Not (Not p)) = Just p

    -- de Morgan
    cnf (Not (p `Or` q))  = Just $ (Not p) `And` (Not q)
    cnf (Not (p `And` q)) = Just $ (Not p) `Or` (Not q)

    -- distribute conjunctions
    cnf (p `Or` (q `And` r)) = Just $ (p `Or` q) `And` (p `Or` r)
    cnf ((p `And` q) `Or` r) = Just $ (p `Or` q) `And` (p `Or` r)
    cnf _ = Nothing


example1 :: Expr String
example1 = simplify (Not (Not (Not (Not (Var "a")))))
-- Var "a"

example2 :: [String]
example2 = [a | Var a <- universe ex]
  where
    ex = Or (And (Var "a") (Var "b")) (Not (And (Var "c") (Var "d")))
-- ["a","b","c","d"]

example3 :: Expr String
example3 = reduce $ ((a `And` b) `Or` (c `And` d)) `Or` e
  where
    a = Var "a"
    b = Var "b"
    c = Var "c"
    d = Var "d"
    e = Var "e"

Alternatively Uniplate instances can be derived automatically from instances of Data without the need to explicitly write a Uniplate instance. This approach carries a slight amount of overhead over an explicit hand-written instance.

import Data.Data
import Data.Typeable
import Data.Generics.Uniplate.Data

data Expr a
  = Fls
  | Tru
  | Lit a
  | Not (Expr a)
  | And (Expr a) (Expr a)
  | Or (Expr a) (Expr a)
  deriving (Data, Typeable, Show, Eq)

Biplate

Biplates generalize plates where the target type isn't necessarily the same as the source.

descendBi :: Biplate from to => (to -> to) -> from -> from
transformBi :: Biplate from to => (to -> to) -> from -> from
rewriteBi :: Biplate from to => (to -> Maybe to) -> from -> from
{-# LANGUAGE MultiParamTypeClasses #-}
{-# LANGUAGE FlexibleContexts #-}

import Data.Generics.Uniplate.Direct

type Name = String

data Expr
  = Var Name
  | Lam Name Expr
  | App Expr Expr
  deriving Show

data Stmt
  = Decl [Stmt]
  | Let Name Expr
  deriving Show

instance Uniplate Expr where
  uniplate (Var x  ) = plate Var |- x
  uniplate (App x y) = plate App |* x |* y
  uniplate (Lam x y) = plate Lam |- x |* y

instance Biplate Expr Expr where
  biplate = plateSelf

instance Uniplate Stmt where
  uniplate (Decl x  ) = plate Decl ||* x
  uniplate (Let x y) = plate Let |-  x |- y

instance Biplate Stmt Stmt where
  biplate = plateSelf

instance Biplate Stmt Expr where
  biplate (Decl x) = plate Decl ||+ x
  biplate (Let x y) = plate Let |- x |* y

rename :: Name -> Name -> Expr -> Expr
rename from to = rewrite f
  where
    f (Var a) | a == from = Just (Var to)
    f (Lam a b) | a == from = Just (Lam to b)
    f _ = Nothing

s, k, sk :: Expr
s = Lam "x" (Lam "y" (Lam "z" (App (App (Var "x") (Var "z")) (App (Var "y") (Var "z")))))
k = Lam "x" (Lam "y" (Var "x"))
sk = App s k

m :: Stmt
m = descendBi f $ Decl [ (Let "s" s) , Let "k" k , Let "sk" sk ]
  where
    f = rename "x" "a"
      . rename "y" "b"
      . rename "z" "c"

Numbers

Integer

The Integer type in GHC is implemented by the GMP (libgmp) arbitrary precision arithmetic library. Unlike the Int type the size of Integer values are bounded only by the available memory. Most notably libgmp is the on few libraries that compiled Haskell binaries are dynamically linked against.

See: GHC, primops and exorcising GMP

Complex

data Complex a = a :+ a 
mkPolar :: RealFloat a => a -> a -> Complex a

The Num instance for Complex is only defined if parameter of Complex is an instance of RealFloat.

λ: 0 :+ 1
0 :+ 1 :: Complex Integer

λ: (0 :+ 1) + (1 :+ 0)
1.0 :+ 1.0 :: Complex Integer

λ: exp (0 :+ 2 * pi)
1.0 :+ (-2.4492935982947064e-16) :: Complex Double

λ: mkPolar 1 (2*pi)
1.0 :+ (-2.4492935982947064e-16) :: Complex Double

λ: let f x n = (cos x :+ sin x)^n
λ: let g x n = cos (n*x) :+ sin (n*x)

Scientific

scientific :: Integer -> Int -> Scientific
fromFloatDigits :: RealFloat a => a -> Scientific

Scientific provides arbitrary-precision number represented using scientific notation. The constructor takes an arbitrarily sized Integer argument with for digits and a Int for the exponential. Alternatively the value can be parsed from a String or coerced from either Double/Float.

import Data.Scientific

c, h, g, a, k :: Scientific
c = scientific 299792458 (0)   -- Speed of light
h = scientific 662606957 (-42) -- Planck's constant
g = scientific 667384    (-16) -- Gravitational constant
a = scientific 729735257 (-11) -- Fine structure constant
k = scientific 268545200 (-9)  -- Khinchin Constant

tau :: Scientific
tau = fromFloatDigits (2*pi)

maxDouble64 :: Double
maxDouble64 = read "1.7976931348623159e308"
-- Infinity

maxScientific :: Scientific
maxScientific = read "1.7976931348623159e308"
-- 1.7976931348623159e308

Statistics

import Data.Vector
import Statistics.Sample

import Statistics.Distribution.Normal
import Statistics.Distribution.Poisson
import qualified Statistics.Distribution as S

s1 :: Vector Double
s1 = fromList [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

s2 :: PoissonDistribution
s2 = poisson 2.5

s3 :: NormalDistribution
s3 = normalDistr mean stdDev
  where
    mean   = 1
    stdDev = 1

descriptive = do
  print $ range s1
  -- 9.0
  print $ mean s1
  -- 5.5
  print $ stdDev s1
  -- 3.0276503540974917
  print $ variance s1
  -- 8.25
  print $ harmonicMean s1
  -- 3.414171521474055
  print $ geometricMean s1
  -- 4.5287286881167645

discrete = do
  print $ S.cumulative s2 0
  -- 8.208499862389884e-2
  print $ S.mean s2
  -- 2.5
  print $ S.variance s2
  -- 2.5
  print $ S.stdDev s2
  -- 1.5811388300841898

continuous = do
  print $ S.cumulative s3 0
  -- 0.15865525393145707
  print $ S.quantile s3 0.5
  -- 1.0
  print $ S.density s3 0
  -- 0.24197072451914334
  print $ S.mean s3
  -- 1.0
  print $ S.variance s3
  -- 1.0
  print $ S.stdDev s3
  -- 1.0

Constructive Reals

Instead of modeling the real numbers of finite precision floating point numbers we alternatively work with Num of that internally manipulate the power series expansions for the expressions when performing operations like arithmetic or transcendental functions without loosing precision when performing intermediate computations. Then when simply slice of a fixed number of terms and approximate the resulting number to a desired precision. This approach is not without it's limitations and caveats ( notably that it may diverge ) but works quite well in practice.

exp(x)    = 1 + x + 1/2*x^2 + 1/6*x^3 + 1/24*x^4 + 1/120*x^5 ...
sqrt(1+x) = 1 + 1/2*x - 1/8*x^2 + 1/16*x^3 - 5/128*x^4 + 7/256*x^5 ...
atan(x)   = x - 1/3*x^3 + 1/5*x^5 - 1/7*x^7 + 1/9*x^9 - 1/11*x^11 ...
pi        = 16 * atan (1/5) - 4 * atan (1/239)
import Data.Number.CReal

-- algebraic
phi :: CReal
phi = (1 + sqrt 5) / 2

-- transcendental
ramanujan :: CReal
ramanujan = exp (pi * sqrt 163)

main :: IO ()
main = do
  putStrLn $ showCReal 30 pi
  -- 3.141592653589793238462643383279
  putStrLn $ showCReal 30 phi
  -- 1.618033988749894848204586834366
  putStrLn $ showCReal 15 ramanujan
  -- 262537412640768743.99999999999925

Data Structures

Vector

Vectors are high performance single dimensional arrays that come come in six variants, two for each of the following types of a mutable and an immutable variant.

  • Data.Vector
  • Data.Vector.Storable
  • Data.Vector.Unboxed

The most notable feature of vectors is constant time memory access with ((!)) as well as variety of efficient map, fold and scan operations on top of a fusion framework that generates surprisingly optimal code.

fromList :: [a] -> Vector a
toList :: Vector a -> [a]
(!) :: Vector a -> Int -> a
map :: (a -> b) -> Vector a -> Vector b
foldl :: (a -> b -> a) -> a -> Vector b -> a
scanl :: (a -> b -> a) -> a -> Vector b -> Vector a
zipWith :: (a -> b -> c) -> Vector a -> Vector b -> Vector c
iterateN :: Int -> (a -> a) -> a -> Vector a
import Data.Vector.Unboxed as V

norm ::  Vector Double -> Double
norm = sqrt . V.sum . V.map (\x -> x*x)

example1 :: Double
example1 = norm $ V.iterateN 100000000 (+1) 0.0

See: Numerical Haskell: A Vector Tutorial

Mutable Vectors

freeze :: MVector (PrimState m) a -> m (Vector a)
thaw :: Vector a -> MVector (PrimState m) a

Within the IO monad we can perform arbitrary read and writes on the mutable vector with constant time reads and writes. When needed a static Vector can be created to/from the MVector using the freeze/thaw functions.

import GHC.Prim
import Control.Monad
import Control.Monad.ST
import Control.Monad.Primitive

import Data.Vector.Unboxed (freeze)
import Data.Vector.Unboxed.Mutable
import qualified Data.Vector.Unboxed as V

example :: PrimMonad m => m (V.Vector Int)
example = do
  v <- new 10
  forM_ [0..9] $ \i ->
     write v i (2*i)
  freeze v

-- vector computation in IO
vecIO :: IO (V.Vector Int)
vecIO = example

-- vector computation in ST
vecST :: ST s (V.Vector Int)
vecST = example


main :: IO ()
main = do
  vecIO >>= print
  print $ runST vecST

Map

import qualified Data.Map as Map

kv :: Map.Map Integer String
kv = Map.fromList [(1, "a"), (2, "b")]

lkup :: Integer -> String -> String
lkup key def =
  case Map.lookup key kv of
    Just val -> val
    Nothing  -> def

Tree

import Data.Tree

{-

   A
  / \
 B   C
    / \
   D   E

-}

tree :: Tree String
tree = Node "A" [Node "B" [], Node "C" [Node "D" [], Node "E" []]]

postorder :: Tree a -> [a]
postorder (Node a ts) = elts ++ [a]
  where elts = concat (map postorder ts)

preorder :: Tree a -> [a]
preorder (Node a ts) = a : elts
  where elts = concat (map preorder ts)

ex1 = drawTree tree
ex2 = drawForest $ subForest tree
ex3 = flatten tree
ex4 = levels tree
ex5 = preorder tree
ex6 = postorder tree

Set

import qualified Data.Set as Set

set :: Set.Set Integer
set = Set.fromList [1..1000]

memtest :: Integer -> Bool
memtest elt = Set.member elt set

Unordered-Containers

fromList :: (Eq k, Hashable k) => [(k, v)] -> HashMap k v
lookup :: (Eq k, Hashable k) => k -> HashMap k v -> Maybe v
insert :: (Eq k, Hashable k) => k -> v -> HashMap k v -> HashMap k v

Both the HashMap and HashSet are purely functional data structures that are drop in replacements for the containers equivalents but with more efficient space and time performance. Additionally all stored elements must have a Hashable instance.

import qualified Data.HashSet as S
import qualified Data.HashMap.Lazy as M

example1 :: M.HashMap Int Char
example1 = M.fromList $ zip [1..10] ['a'..]

example2 :: S.HashSet Int
example2 = S.fromList [1..10]

See: Johan Tibell: Announcing Unordered Containers

Hashtables

Hashtables provides hashtables with efficient lookup within the ST or IO monad.

import Prelude hiding (lookup)

import Control.Monad.ST
import Data.HashTable.ST.Basic

-- Hashtable parameterized by ST "thread"
type HT s = HashTable s String String

set :: ST s (HT s)
set = do
  ht <- new
  insert ht "key" "value1"
  return ht

get :: HT s -> ST s (Maybe String)
get ht = do
  val <- lookup ht "key"
  return val

example :: Maybe String
example = runST (set >>= get)
new :: ST s (HashTable s k v)
insert :: (Eq k, Hashable k) => HashTable s k v -> k -> v -> ST s ()
lookup :: (Eq k, Hashable k) => HashTable s k v -> k -> ST s (Maybe v)

Graph

The Graph module in the containers library is a somewhat antiquated API for working with directed graphs. A little bit of data wrapping makes it a little more straightforward to use. The library is not necessarily well-suited for large graph-theoretic operations but is perfectly fine for example, to use in a typechecker which need to resolve strongly connected components of the module definition graph.

import Data.Tree
import Data.Graph

data Grph node key = Grph
  { _graph :: Graph
  , _vertices :: Vertex -> (node, key, [key])
  }

fromList :: Ord key => [(node, key, [key])] -> Grph node key
fromList = uncurry Grph . graphFromEdges'

vertexLabels :: Functor f => Grph b t -> (f Vertex) -> f b
vertexLabels g = fmap (vertexLabel g)

vertexLabel :: Grph b t -> Vertex -> b
vertexLabel g = (\(vi, _, _) -> vi) . (_vertices g)

-- Topologically sort graph
topo' :: Grph node key -> [node]
topo' g = vertexLabels g $ topSort (_graph g)

-- Strongly connected components of graph
scc' :: Grph node key -> [[node]]
scc' g = fmap (vertexLabels g . flatten) $ scc (_graph g)

So for example we can construct a simple graph:

ex1 :: [(String, String, [String])]
ex1 = [
    ("a","a",["b"]),
    ("b","b",["c"]),
    ("c","c",["a"])
  ]

ts1 :: [String]
ts1 = topo' (fromList ex1)
-- ["a","b","c"]

sc1 :: [[String]]
sc1 = scc' (fromList ex1)
-- [["a","b","c"]]

Or with two strongly connected subgraphs:

ex2 :: [(String, String, [String])]
ex2 = [
    ("a","a",["b"]),
    ("b","b",["c"]),
    ("c","c",["a"]),

    ("d","d",["e"]),
    ("e","e",["f", "e"]),
    ("f","f",["d", "e"])
  ]


ts2 :: [String]
ts2 = topo' (fromList ex2)
-- ["d","e","f","a","b","c"]

sc2 :: [[String]]
sc2 = scc' (fromList ex2)
-- [["d","e","f"],["a","b","c"]]

See: GraphSCC

DList

A dlist is a list-like structure that is optimized for O(1) append operations, internally it uses a Church encoding of the list structure. It is specifically suited for operations which are append-only and need only access it when manifesting the entire structure. It is particularly well-suited for use in the Writer monad.

import Data.DList
import Control.Monad
import Control.Monad.Writer

logger :: Writer (DList Int) ()
logger = replicateM_ 100000 $ tell (singleton 0)

Sequence

The sequence data structure behaves structurally similar to list but is optimized for append/prepend operations and traversal.

import Data.Sequence

a :: Seq Int
a = fromList [1,2,3]

a0 :: Seq Int
a0 = a |> 4
-- [1,2,3,4]

a1 :: Seq Int
a1 = 0 <| a
-- [0,1,2,3]

Matrices and HBlas

Just as in C when working with n-dimensional matrices we'll typically overlay the high-level matrix structure onto a unboxed contiguous block of memory with index functions which perform the coordinate translations to calculate offsets. The two most common layouts are:

  • Row Major indexing
  • Column Major indexing

Which are probably best illustrated.

The calculations have a particularly nice implementation in Haskell in terms of scans over indices.

import qualified Data.Vector as V

data Order = RowMajor | ColMajor

rowMajor :: [Int] -> [Int]
rowMajor = scanr (*) 1 . tail

colMajor :: [Int] -> [Int]
colMajor = init . scanl (*) 1

data Matrix a = Matrix
  { _dims  :: [Int]
  , _elts  :: V.Vector a
  , _order :: Order
  }

fromList :: [Int] -> Order -> [a] -> Matrix a
fromList sh order elts =
  if product sh == length elts
  then Matrix sh (V.fromList elts) order
  else error "dimensions don't match"

indexTo :: [Int] -> Matrix a -> a
indexTo ix mat = boundsCheck offset
  where
    boundsCheck n =
      if 0 <= n && n < V.length (_elts mat)
      then V.unsafeIndex (_elts mat) offset
      else error "out of bounds"
    ordering = case _order mat of
      RowMajor -> rowMajor
      ColMajor -> colMajor
    offset = sum $ zipWith (*) ix (ordering (_dims mat))

matrix :: Order -> Matrix Int
matrix order = fromList [4,4] order [1..16]

ex1 :: [Int]
ex1 = rowMajor [1,2,3,4]
-- [24,12,4,1]

ex2 :: [Int]
ex2 = colMajor [1,2,3,4]
-- [1,1,2,6]

ex3 :: Int
ex3 = indexTo [1,3] (matrix RowMajor)
-- 8

ex4 :: Int
ex4 = indexTo [1,3] (matrix ColMajor)
-- 14

Unboxed matrices of this type can also be passed to C or Fortran libraries such BLAS or LAPACK linear algebra libraries. The hblas package wraps many of these routines and forms the low-level wrappers for higher level-libraries that need access to these foreign routines.

For example the dgemm routine takes two pointers to a sequence of double values of two matrices of size (m × k) and (k × n) and performs efficient matrix multiplication writing the resulting data through a pointer to a (m × n) matrix.

import Foreign.Storable
import Numerical.HBLAS.BLAS
import Numerical.HBLAS.MatrixTypes

-- Generate the constant mutable square matrix of the given type and dimensions.
constMatrix :: Storable a => Int -> a -> IO (IODenseMatrix Row a)
constMatrix n k = generateMutableDenseMatrix SRow (n,n) (const k)

example_dgemm :: IO ()
example_dgemm = do
    left  <- constMatrix 2 (2 :: Double)
    right <- constMatrix 2 (3 :: Double)
    out   <- constMatrix 2 (0 :: Double)

    dgemm NoTranspose NoTranspose 1.0 1.0 left right out

    resulting <- mutableVectorToList $ _bufferDenMutMat out
    print resulting

Hopefully hblas and numerical-core libraries will serve as a foundation to build out the Haskell numerical ecosystem in the coming years.

See: hblas

FFI

Pure Functions

Wrapping pure C functions with primitive types is trivial.

/* $(CC) -c simple.c -o simple.o */

int example(int a, int b)
{
  return a + b;
}
-- ghc simple.o simple_ffi.hs -o simple_ffi
{-# LANGUAGE ForeignFunctionInterface #-}

import Foreign.C.Types

foreign import ccall safe "example" example
    :: CInt -> CInt -> CInt

main = print (example 42 27)

Storable Arrays

There exists a Storable typeclass that can be used to provide low-level access to the memory underlying Haskell values. The Prelude defines Storable interfaces for most of the basic types as well as types in the Foreign.C library.

class Storable a where
  sizeOf :: a -> Int
  alignment :: a -> Int
  peek :: Ptr a -> IO a
  poke :: Ptr a -> a -> IO ()

To pass arrays from Haskell to C we can again use Storable Vector and several unsafe operations to grab a foreign pointer to the underlying data that can be handed off to C. Once we're in C land, nothing will protect us from doing evil things to memory!

/* $(CC) -c qsort.c -o qsort.o */
void swap(int *a, int *b)
{
    int t = *a;
    *a = *b;
    *b = t;
}

void sort(int *xs, int beg, int end)
{
    if (end > beg + 1) {
        int piv = xs[beg], l = beg + 1, r = end;

        while (l < r) {
            if (xs[l] <= piv) {
                l++;
            } else {
                swap(&xs[l], &xs[--r]);
            }
        }

        swap(&xs[--l], &xs[beg]);
        sort(xs, beg, l);
        sort(xs, r, end);
    }
}
-- ghc qsort.o ffi.hs -o ffi
{-# LANGUAGE ForeignFunctionInterface #-}

import Foreign.Ptr
import Foreign.C.Types
import Foreign.ForeignPtr
import Foreign.ForeignPtr.Unsafe

import qualified Data.Vector.Storable as V
import qualified Data.Vector.Storable.Mutable as VM

foreign import ccall safe "sort" qsort
    :: Ptr a -> CInt -> CInt -> IO ()

vecPtr :: VM.MVector s CInt -> ForeignPtr CInt
vecPtr = fst . VM.unsafeToForeignPtr0

main :: IO ()
main = do
  let vs = V.fromList ([1,3,5,2,1,2,5,9,6] :: [CInt])
  v <- V.thaw vs
  withForeignPtr (vecPtr v) $ \ptr -> do
    qsort ptr 0 9
  out <- V.freeze v
  print out

The names of foreign functions from a C specific header file can qualified.

foreign import ccall unsafe "stdlib.h malloc"
    malloc :: CSize -> IO (Ptr a)

Prepending the function name with a & allows us to create a reference to the function itself.

foreign import ccall unsafe "stdlib.h &malloc"
    malloc :: FunPtr a

Concurrency

The definitive reference on concurrency and parallelism in Haskell is Simon Marlow's text. This will section will just gloss over these topics because they are far better explained in this book.

See: Parallel and Concurrent Programming in Haskell

forkIO :: IO () -> IO ThreadId

Haskell threads are extremely cheap to spawn, using only 1.5KB of RAM depending on the platform and are much cheaper than a pthread in C. Calling forkIO 106 times completes just short of a 1s. Additionally, functional purity in Haskell also guarantees that a thread can almost always be terminated even in the middle of a computation without concern.

See: The Scheduler

STM

atomically :: STM a -> IO a
orElse :: STM a -> STM a -> STM a
retry :: STM a

newTVar :: a -> STM (TVar a)
newTVarIO :: a -> IO (TVar a)
writeTVar :: TVar a -> a -> STM ()
readTVar :: TVar a -> STM a

modifyTVar :: TVar a -> (a -> a) -> STM ()
modifyTVar' :: TVar a -> (a -> a) -> STM ()

Software Transactional Memory is a technique for guaranteeing atomicity of values in parallel computations, such that all contexts view the same data when read and writes are guaranteed never to result in inconsistent states.

The strength of Haskell's purity guarantees that transactions within STM are pure and can always be rolled back if a commit fails.

import Control.Monad
import Control.Concurrent
import Control.Concurrent.STM

type Account = TVar Double

transfer :: Account -> Account -> Double -> STM ()
transfer from to amount = do
  available <- readTVar from
  when (amount > available) retry

  modifyTVar from (+ (-amount))
  modifyTVar to   (+ amount)

-- Threads are scheduled non-deterministically.
actions :: Account -> Account -> [IO ThreadId]
actions a b = map forkIO [
     -- transfer to
       atomically (transfer a b 10)
     , atomically (transfer a b (-20))
     , atomically (transfer a b 30)

     -- transfer back
     , atomically (transfer a b (-30))
     , atomically (transfer a b 20)
     , atomically (transfer a b (-10))
   ]

main :: IO ()
main = do
  accountA <- atomically $ newTVar 60
  accountB <- atomically $ newTVar 0

  sequence_ (actions accountA accountB)

  balanceA <- atomically $ readTVar accountA
  balanceB <- atomically $ readTVar accountB

  print $ balanceA == 60
  print $ balanceB == 0

See: Beautiful Concurrency

par-monad

Using the Par monad we express our computation as a data flow graph which is scheduled in order of the connections between forked computations which exchange resulting computations with IVar.

new :: Par (IVar a)
put :: NFData a => IVar a -> a -> Par ()
get :: IVar a -> Par a
fork :: Par () -> Par ()
spawn :: NFData a => Par a -> Par (IVar a)
import Control.Monad
import Control.Monad.Par

f, g :: Int -> Int
f x = x + 10
g x = x * 10

--   f x      g x
--     \     /
--      a + b
--      /   \
-- f (a+b)  g (a+b)
--      \   /
--      (d,e)

example1 :: Int -> (Int, Int)
example1 x = runPar $ do
  [a,b,c,d,e] <- replicateM 5 new
  fork (put a (f x))
  fork (put b (g x))
  a' <- get a
  b' <- get b
  fork (put c (a' + b'))
  c' <- get c
  fork (put d (f c'))
  fork (put e (g c'))
  d' <- get d
  e' <- get e
  return (d', e')

example2 :: [Int]
example2 = runPar $ do
  xs <- parMap (+1) [1..25]
  return xs

-- foldr (+) 0 (map (^2) [1..xs])
example3 :: Int -> Int
example3 n = runPar $ do
  let range = (InclusiveRange 1 n)
  let mapper x = return (x^2)
  let reducer x y = return (x+y)
  parMapReduceRangeThresh 10 range mapper reducer 0

async

Async is a higher level set of functions that work on top of Control.Concurrent and STM.

async :: IO a -> IO (Async a)
wait :: Async a -> IO a
cancel :: Async a -> IO ()
concurrently :: IO a -> IO b -> IO (a, b)
race :: IO a -> IO b -> IO (Either a b)
import Control.Monad
import Control.Applicative
import Control.Concurrent
import Control.Concurrent.Async
import Data.Time

timeit :: IO a -> IO (a,Double)
timeit io = do
  t0 <- getCurrentTime
  a <- io
  t1 <- getCurrentTime
  return (a, realToFrac (t1 `diffUTCTime` t0))

worker :: Int -> IO Int
worker n = do
  -- simulate some work
  threadDelay (10^2 * n)
  return (n * n)

-- Spawn 2 threads in parallel, halt on both finished.
test1 :: IO (Int, Int)
test1 = do
  val1 <- async $ worker 1000
  val2 <- async $ worker 2000
  (,) <$> wait val1 <*> wait val2

-- Spawn 2 threads in parallel, halt on first finished.
test2 :: IO (Either Int Int)
test2 = do
  let val1 = worker 1000
  let val2 = worker 2000
  race val1 val2

-- Spawn 10000 threads in parallel, halt on all finished.
test3 :: IO [Int]
test3 = mapConcurrently worker [0..10000]

main :: IO ()
main = do
  print =<< timeit test1
  print =<< timeit test2
  print =<< timeit test3

Graphics

Diagrams

Diagrams is a a parser combinator library for generating vector images to SVG and a variety of other formats.

import Diagrams.Prelude
import Diagrams.Backend.SVG.CmdLine

sierpinksi :: Int -> Diagram SVG R2
sierpinksi 1 = eqTriangle 1
sierpinksi n =
      s
     ===
  (s ||| s) # centerX
  where
    s = sierpinksi (n - 1)

example :: Diagram SVG R2
example = sierpinksi 5 # fc black

main :: IO ()
main = defaultMain example
$ runhaskell diagram1.hs -w 256 -h 256 -o diagram1.svg

See: Diagrams Quick Start Tutorial

Gloss

Parsing

Parsec

For parsing in Haskell it is quite common to use a family of libraries known as Parser Combinators which let us write code to generate parsers which themselves looks very similar to the parser grammar itself!

Combinators
<|> The choice operator tries to parse the first argument before proceeding to the second. Can be chained sequentially to a generate a sequence of options.
many Consumes an arbitrary number of patterns matching the given pattern and returns them as a list.
many1 Like many but requires at least one match.
optional Optionally parses a given pattern returning it's value as a Maybe.
try Backtracking operator will let us parse ambiguous matching expressions and restart with a different pattern.

There are two styles of writing Parsec, one can choose to write with monads or with applicatives.

parseM :: Parser Expr
parseM = do
  a <- identifier
  char '+'
  b <- identifier
  return $ Add a b

The same code written with applicatives uses the applicative combinators:

-- | Sequential application.
(<*>) :: f (a -> b) -> f a -> f b

-- | Sequence actions, discarding the value of the first argument.
(*>) :: f a -> f b -> f b
(*>) = liftA2 (const id)

-- | Sequence actions, discarding the value of the second argument.
(<*) :: f a -> f b -> f a
(<*) = liftA2 const
parseA :: Parser Expr
parseA = Add <$> identifier <* char '+' <*> identifier

Now for instance if we want to parse simple lambda expressions we can encode the parser logic as compositions of these combinators which yield the string parser when evaluated under with the parse.

import Text.Parsec
import Text.Parsec.String

data Expr
  = Var Char
  | Lam Char Expr
  | App Expr Expr
  deriving Show

lam :: Parser Expr
lam = do
  char '\\'
  n <- letter
  string "->"
  e <- expr
  return $ Lam n e

app :: Parser Expr
app = do
  apps <- many1 term
  return $ foldl1 App apps

var :: Parser Expr
var = do
  n <- letter
  return $ Var n

parens :: Parser Expr -> Parser Expr
parens p = do
  char '('
  e <- p
  char ')'
  return e

term :: Parser Expr
term = var <|> parens expr

expr :: Parser Expr
expr = lam <|> app

decl :: Parser Expr
decl = do
  e <- expr
  eof
  return e

test :: IO ()
test = parseTest decl "\\y->y(\\x->x)y"

main :: IO ()
main = test >>= print

Custom Lexer

In our previous example lexing pass was not necessary because each lexeme mapped to a sequential collection of characters in the stream type. If we wanted to extend this parser with a non-trivial set of tokens, then Parsec provides us with a set of functions for defining lexers and integrating these with the parser combinators. The simplest example builds on top of the builtin Parsec language definitions which define a set of most common lexical schemes.

haskellDef   :: LanguageDef st
emptyDef     :: LanguageDef st
haskellStyle :: LanguageDef st
javaStyle    :: LanguageDef st

For instance we'll build on top of the empty language grammar.

import Text.Parsec
import Text.Parsec.Expr
import Text.Parsec.String
import qualified Text.Parsec.Token as Token

lexerStyle :: Token.LanguageDef ()
lexerStyle = Token.LanguageDef
  { Token.commentStart   = "{-"
  , Token.commentEnd     = "-}"
  , Token.commentLine    = "--"
  , Token.nestedComments = True
  , Token.identStart     = letter
  , Token.identLetter    = alphaNum <|> oneOf "_"
  , Token.opStart        = Token.opLetter lexerStyle
  , Token.opLetter       = oneOf "`~!@$%^&*-+=;:<>./?"
  , Token.reservedOpNames= []
  , Token.reservedNames  = ["if", "then", "else", "def"]
  , Token.caseSensitive  = True
  }

lexer :: Token.TokenParser ()
lexer = Token.makeTokenParser lexerStyle

parens :: Parser a -> Parser a
parens = Token.parens lexer

natural :: Parser Integer
natural = Token.natural lexer

identifier :: Parser String
identifier = Token.identifier lexer

reservedOp :: String -> Parser ()
reservedOp = Token.reservedOp lexer

reserved :: String -> Parser ()
reserved = Token.reserved lexer

whiteSpace :: Parser ()
whiteSpace = Token.whiteSpace lexer

comma :: Parser String
comma = Token.comma lexer

See: Text.ParserCombinators.Parsec.Language

Simple Parsing

Putting our lexer and parser together we can write down a more robust parser for our little lambda calculus syntax.

module Parser (parseExpr) where

import Text.Parsec
import Text.Parsec.String (Parser)
import Text.Parsec.Language (haskellStyle)

import qualified Text.Parsec.Expr as Ex
import qualified Text.Parsec.Token as Tok

type Id = String

data Expr
  = Lam Id Expr
  | App Expr Expr
  | Var Id
  | Num Int
  | Op  Binop Expr Expr
  deriving (Show)

data Binop = Add | Sub | Mul deriving Show

lexer :: Tok.TokenParser ()
lexer = Tok.makeTokenParser style
  where ops = ["->","\\","+","*","-","="]
        style = haskellStyle {Tok.reservedOpNames = ops }

reservedOp :: String -> Parser ()
reservedOp = Tok.reservedOp lexer

identifier :: Parser String
identifier = Tok.identifier lexer

parens :: Parser a -> Parser a
parens = Tok.parens lexer

contents :: Parser a -> Parser a
contents p = do
  Tok.whiteSpace lexer
  r <- p
  eof
  return r

natural :: Parser Integer
natural = Tok.natural lexer

variable :: Parser Expr
variable = do
  x <- identifier
  return (Var x)

number :: Parser Expr
number = do
  n <- natural
  return (Num (fromIntegral n))

lambda :: Parser Expr
lambda = do
  reservedOp "\\"
  x <- identifier
  reservedOp "->"
  e <- expr
  return (Lam x e)

aexp :: Parser Expr
aexp =  parens expr
    <|> variable
    <|> number
    <|> lambda

term :: Parser Expr
term = Ex.buildExpressionParser table aexp
  where infixOp x f = Ex.Infix (reservedOp x >> return f)
        table = [[infixOp "*" (Op Mul) Ex.AssocLeft],
                 [infixOp "+" (Op Add) Ex.AssocLeft]]

expr :: Parser Expr
expr = do
  es <- many1 term
  return (foldl1 App es)

parseExpr :: String -> Expr
parseExpr input =
  case parse (contents expr) "<stdin>" input of
    Left err -> error (show err)
    Right ast -> ast

main :: IO ()
main = getLine >>= print . parseExpr >> main

Trying it out:

λ: runhaskell simpleparser.hs
1+2
Op Add (Num 1) (Num 2)

\i -> \x -> x
Lam "i" (Lam "x" (Var "x"))

\s -> \f -> \g -> \x -> f x (g x)
Lam "s" (Lam "f" (Lam "g" (Lam "x" (App (App (Var "f") (Var "x")) (App (Var "g") (Var "x"))))))

Stateful Parsing

For a more complex use, consider parser that are internally stateful, for example adding operators that can defined at parse-time and are dynamically added to the expressionParser table upon definition.

module Main where

import qualified Text.Parsec.Expr as Ex
import qualified Text.Parsec.Token as Tok

import Text.Parsec.Language (haskellStyle)

import Data.List
import Data.Function

import Control.Monad.Identity (Identity)

import Text.Parsec
import qualified Text.Parsec as P

type Name = String

data Expr
  = Var Name
  | Lam Name Expr
  | App Expr Expr
  | Let Name Expr Expr
  | BinOp Name Expr Expr
  | UnOp Name Expr
  deriving (Show)

data Assoc
  = OpLeft
  | OpRight
  | OpNone
  | OpPrefix
  | OpPostfix
  deriving Show

data Decl
  = LetDecl Expr
  | OpDecl OperatorDef
  deriving (Show)

type Op x = Ex.Operator String ParseState Identity x
type Parser a = Parsec String ParseState a
data ParseState = ParseState [OperatorDef] deriving Show

data OperatorDef = OperatorDef {
    oassoc :: Assoc
  , oprec :: Integer
  , otok :: Name
  } deriving Show

lexer :: Tok.GenTokenParser String u Identity
lexer = Tok.makeTokenParser style
  where ops = ["->","\\","+","*","<","=","[","]","_"]
        names = ["let","in","infixl", "infixr", "infix", "postfix", "prefix"]
        style = haskellStyle { Tok.reservedOpNames = ops
                             , Tok.reservedNames = names
                             , Tok.identLetter = alphaNum <|> oneOf "#'_"
                             , Tok.commentLine = "--"
                             }

reserved   = Tok.reserved lexer
reservedOp = Tok.reservedOp lexer
identifier = Tok.identifier lexer
parens     = Tok.parens lexer
brackets   = Tok.brackets lexer
braces     = Tok.braces lexer
commaSep   = Tok.commaSep lexer
semi       = Tok.semi lexer
integer    = Tok.integer lexer
chr        = Tok.charLiteral lexer
str        = Tok.stringLiteral lexer
operator   = Tok.operator lexer

contents :: Parser a -> Parser a
contents p = do
  Tok.whiteSpace lexer
  r <- p
  eof
  return r

expr :: Parser Expr
expr = do
  es <- many1 term
  return (foldl1 App es)

lambda :: Parser Expr
lambda = do
  reservedOp "\\"
  args <- identifier
  reservedOp "->"
  body <- expr
  return $ Lam args body

letin :: Parser Expr
letin = do
  reserved "let"
  x <- identifier
  reservedOp "="
  e1 <- expr
  reserved "in"
  e2 <- expr
  return (Let x e1 e2)

variable :: Parser Expr
variable = do
  x <- identifier
  return (Var x)


addOperator :: OperatorDef -> Parser ()
addOperator a = P.modifyState $ \(ParseState ops) -> ParseState (a : ops)

mkTable :: ParseState -> [[Op Expr]]
mkTable (ParseState ops) =
  map (map toParser) $
    groupBy ((==) `on` oprec) $
      reverse $ sortBy (compare `on` oprec) $ ops

toParser :: OperatorDef -> Op Expr
toParser (OperatorDef ass _ tok) = case ass of
    OpLeft    -> infixOp tok (BinOp tok) (toAssoc ass)
    OpRight   -> infixOp tok (BinOp tok) (toAssoc ass)
    OpNone    -> infixOp tok (BinOp tok) (toAssoc ass)
    OpPrefix  -> prefixOp tok (UnOp tok)
    OpPostfix -> postfixOp tok (UnOp tok)
  where
    toAssoc OpLeft = Ex.AssocLeft
    toAssoc OpRight = Ex.AssocRight
    toAssoc OpNone = Ex.AssocNone
    toAssoc _ = error "no associativity"

infixOp :: String -> (a -> a -> a) -> Ex.Assoc -> Op a
infixOp x f = Ex.Infix (reservedOp x >> return f)

prefixOp :: String -> (a -> a) -> Ex.Operator String u Identity a
prefixOp name f = Ex.Prefix (reservedOp name >> return f)

postfixOp :: String -> (a -> a) -> Ex.Operator String u Identity a
postfixOp name f = Ex.Postfix (reservedOp name >> return f)

term :: Parser Expr
term = do
  tbl <- getState
  let table = mkTable tbl
  Ex.buildExpressionParser table aexp

aexp :: Parser Expr
aexp =  letin
    <|> lambda
    <|> variable
    <|> parens expr

letdecl :: Parser Decl
letdecl = do
  e <- expr
  return $ LetDecl e


opleft :: Parser Decl
opleft = do
  reserved "infixl"
  prec <- integer
  sym <- parens operator
  let op = (OperatorDef OpLeft prec sym)
  addOperator op
  return $ OpDecl op

opright :: Parser Decl
opright = do
  reserved "infixr"
  prec <- integer
  sym <- parens operator
  let op = (OperatorDef OpRight prec sym)
  addOperator op
  return $ OpDecl op

opnone :: Parser Decl
opnone = do
  reserved "infix"
  prec <- integer
  sym <- parens operator
  let op = (OperatorDef OpNone prec sym)
  addOperator op
  return $ OpDecl op

opprefix :: Parser Decl
opprefix = do
  reserved "prefix"
  prec <- integer
  sym <- parens operator
  let op = OperatorDef OpPrefix prec sym
  addOperator op
  return $ OpDecl op

oppostfix :: Parser Decl
oppostfix = do
  reserved "postfix"
  prec <- integer
  sym <- parens operator
  let op = OperatorDef OpPostfix prec sym
  addOperator op
  return $ OpDecl op

decl :: Parser Decl
decl =
    try letdecl
    <|> opleft
    <|> opright
    <|> opnone
    <|> opprefix
    <|> oppostfix

top :: Parser Decl
top = do
  x <- decl
  P.optional semi
  return x


modl :: Parser [Decl]
modl = many top

parseModule :: SourceName -> String -> Either ParseError [Decl]
parseModule filePath = P.runParser (contents modl) (ParseState []) filePath

main :: IO ()
main = do
  input <- readFile "test.in"
  let res = parseModule "<stdin>" input
  case res of
    Left err -> print err
    Right ast -> mapM_ print ast

For example input try:

infixl 3 ($);
infixr 4 (#);

infix 4 (.);

prefix 10 (-);
postfix 10 (!);

let z = y in a $ a $ (-a)!;
let z = y in a # a # a $ b; let z = y in a # a # a # b;

Attoparsec

Attoparsec is a parser combinator like Parsec but more suited for bulk parsing of large text and binary files instead of parsing language syntax to ASTs. When written properly Attoparsec parsers can be extremely efficient.

{-# LANGUAGE OverloadedStrings #-}

import Control.Monad

import Data.Attoparsec
import Data.Attoparsec.Char8 as A
import Data.ByteString.Char8

data Action
  = Success
  | KeepAlive
  | NoResource
  | Hangup
  | NewLeader
  | Election
  deriving Show

type Sender = ByteString
type Payload = ByteString

data Message = Message
  { action :: Action
  , sender :: Sender
  , payload :: Payload
  } deriving Show

proto :: Parser Message
proto = do
  act  <- paction
  send <- A.takeTill (== '.')
  body <- A.takeTill (A.isSpace)
  endOfLine
  return $ Message act send body

paction :: Parser Action
paction = do
  c <- anyWord8
  case c of
    1  -> return Success
    2  -> return KeepAlive
    3  -> return NoResource
    4  -> return Hangup
    5  -> return NewLeader
    6  -> return Election
    _  -> mzero

main :: IO ()
main = do
  let msgtext = "\x01\x6c\x61\x70\x74\x6f\x70\x2e\x33\x2e\x31\x34\x31\x35\x39\x32\x36\x35\x33\x35\x0A"
  let msg = parseOnly proto msgtext
  print msg

See: Text Parsing Tutorial

Optparse-Applicative

Optparse applicative is a library for parsing command line options with a interface similar to parsec that makes also makes heavy use of monoids to combine operations.

import Data.List
import Data.Monoid
import Options.Applicative

data Opts = Opts
  { _files :: [String]
  , _quiet :: Bool
  , _fast :: Speed
  }

data Speed = Slow | Fast

options :: Parser Opts
options = Opts <$> filename <*> quiet <*> fast
  where
    filename :: Parser [String]
    filename = many $ argument str $
         metavar "filename..."
      <> help "Input files"

    fast :: Parser Speed
    fast = flag Slow Fast $
         long "cheetah"
      <> help "Perform task quickly."

    quiet :: Parser Bool
    quiet = switch $
         long "quiet"
      <> help "Whether to shut up."

greet :: Opts -> IO ()
greet (Opts files quiet fast) = do
  putStrLn "reading these files:"
  mapM_ print files

  case fast of
    Fast -> putStrLn "quickly"
    Slow -> putStrLn "slowly"

  case quiet of
    True  -> putStrLn "quietly"
    False -> putStrLn "loudly"

opts :: ParserInfo Opts
opts = info (helper <*> options) fullDesc

main :: IO ()
main = execParser opts >>= greet

See: optparse-applicative

Streaming

Lazy IO

The problem with using the usual monadic approach to processing data accumulated through IO is that the Prelude tools require us to manifest large amounts of data in memory all at once before we can even begin computation.

mapM :: Monad m => (a -> m b) -> [a] -> m [b]
sequence :: Monad m => [m a] -> m [a]

Reading from the file creates an thunk for the string that forced will then read the file. The problem is then that this method ties the ordering of IO effects to evaluation order which is difficult to reason about in the large.

Consider that normally the monad laws ( in the absence of seq ) guarantee that these computations should be identical. But using lazy IO we can construct a degenerate case.

import System.IO

main :: IO ()
main = do
  withFile "foo.txt" ReadMode $ \fd -> do
    contents <- hGetContents fd
    print contents
  -- "foo\n"

  contents <- withFile "foo.txt" ReadMode hGetContents
  print contents
  -- ""

So what we need is a system to guarantee deterministic resource handling with constant memory usage. To that end both the Conduits and Pipes libraries solved this problem using different ( though largely equivalent ) approaches.

Pipes

await :: Monad m => Pipe a y m a
yield :: Monad m => a -> Pipe x a m ()

(>->) :: Monad m
      => Pipe a b m r
      -> Pipe b c m r
      -> Pipe a c m r

runEffect :: Monad m => Effect m r -> m r
toListM :: Monad m => Producer a m () -> m [a]

Pipes is a stream processing library with a strong emphasis on the static semantics of composition. The simplest usage is to connect "pipe" functions with a (>->) composition operator, where each component can await and yield to push and pull values along the stream.

import Pipes
import Pipes.Prelude as P
import Control.Monad
import Control.Monad.Identity

a :: Producer Int Identity ()
a = forM_ [1..10] yield

b :: Pipe Int Int Identity ()
b =  forever $ do
  x <- await
  yield (x*2)
  yield (x*3)
  yield (x*4)

c :: Pipe Int Int Identity ()
c = forever $ do
  x <- await
  if (x `mod` 2) == 0
    then yield x
    else return ()

result :: [Int]
result = P.toList $ a >-> b >-> c

For example we could construct a "FizzBuzz" pipe.

{-# LANGUAGE MultiWayIf #-}

import Pipes
import qualified Pipes.Prelude as P

import Control.Monad

a :: Producer Integer IO ()
a = each [1..100]

b :: Pipe Integer String IO ()
b = do
  n <- await
  if | n `mod` 15 == 0 -> yield "FizzBuzz"
     | n `mod` 5  == 0 -> yield "Fizz"
     | n `mod` 3  == 0 -> yield "Buzz"
     | otherwise       -> return ()

main = runEffect $ a >-> b >-> P.stdoutLn

To continue with the degenerate case we constructed with Lazy IO, consider than we can now compose and sequence deterministic actions over files without having to worry about effect order.

import Pipes
import Pipes.Prelude as P
import System.IO

readF :: FilePath -> Producer String IO ()
readF file = do
    lift $ putStrLn $ "Opened" ++ file
    h <- lift $ openFile file ReadMode
    fromHandle h
    lift $ putStrLn $ "Closed" ++ file
    lift $ hClose h

main :: IO ()
main = runEffect $ readF "foo.txt" >-> P.take 3 >-> stdoutLn

See: Pipes Tutorial

Safe Pipes

bracket :: MonadSafe m => Base m a -> (a -> Base m b) -> (a -> m c) -> m c

As a motivating example, ZeroMQ is a network messaging library that abstracts over traditional Unix sockets to a variety of network topologies. Most notably it isn't designed to guarantee any sort of transactional guarantees for delivery or recovery in case of errors so it's necessary to design a layer on top of it to provide the desired behavior at the application layer.

In Haskell we'd like to guarantee that if we're polling on a socket we get messages delivered in a timely fashion or consider the resource in a error state and recover from it. Using pipes-safe we can manage the life cycle of lazy IO resources and can safely handle failures, resource termination and finalization gracefully. In other languages this kind of logic would be smeared across several places, or put in some global context and prone to introduce errors and subtle race conditions. Using pipes we instead get a nice tight abstraction designed exactly to fit this kind of use case.

For instance now we can bracket the ZeroMQ socket creation and finalization within the SafeT monad transformer which guarantees that after successful message delivery we execute the pipes function as expected, or on failure we halt the execution and finalize the socket.

import Pipes
import Pipes.Safe
import qualified Pipes.Prelude as P

import System.Timeout (timeout)
import Data.ByteString.Char8
import qualified System.ZMQ as ZMQ

data Opts = Opts
  { _addr    :: String  -- ^ ZMQ socket address
  , _timeout :: Int     -- ^ Time in milliseconds for socket timeout
  }

recvTimeout :: Opts -> ZMQ.Socket a -> Producer ByteString (SafeT IO) ()
recvTimeout opts sock = do
  body <- liftIO $ timeout (_timeout opts) (ZMQ.receive sock [])
  case body of
    Just msg -> do
      liftIO $ ZMQ.send sock msg []
      yield msg
      recvTimeout opts sock
    Nothing  -> liftIO $ print "socket timed out"

collect :: ZMQ.Context
        -> Opts
        -> Producer ByteString (SafeT IO) ()
collect ctx opts = bracket zinit zclose (recvTimeout opts)
  where
    -- Initialize the socket
    zinit = do
      liftIO $ print "waiting for messages"
      sock <- ZMQ.socket ctx ZMQ.Rep
      ZMQ.bind sock (_addr opts)
      return sock

    -- On timeout or completion guarantee the socket get closed.
    zclose sock = do
      liftIO $ print "finalizing"
      ZMQ.close sock

runZmq :: ZMQ.Context -> Opts -> IO ()
runZmq ctx opts = runSafeT $ runEffect $
  collect ctx opts >-> P.take 10 >-> P.print

main :: IO ()
main = do
  ctx <- ZMQ.init 1
  let opts = Opts {_addr = "tcp://127.0.0.1:8000", _timeout = 1000000 }
  runZmq ctx opts
  ZMQ.term ctx

Conduits

await :: Monad m => ConduitM i o m (Maybe i)
yield :: Monad m => o -> ConduitM i o m ()
($$) :: Monad m => Source m a -> Sink a m b -> m b
(=$) :: Monad m => Conduit a m b -> Sink b m c -> Sink a m c

type Sink i = ConduitM i Void
type Source m o = ConduitM () o m ()
type Conduit i m o = ConduitM i o m ()

Conduits are conceptually similar though philosophically different approach to the same problem of constant space deterministic resource handling for IO resources.

The first initial difference is that await function now returns a Maybe which allows different handling of termination. The composition operators are also split into a connecting operator ($$) and a fusing operator (=$) for combining Sources and Sink and a Conduit and a Sink respectively.

{-# LANGUAGE MultiWayIf #-}

import Data.Conduit
import Control.Monad.Trans
import qualified Data.Conduit.List as CL

source :: Source IO Int
source = CL.sourceList [1..100]

conduit :: Conduit Int IO String
conduit = do
  val <- await
  liftIO $ print val
  case val of
    Nothing -> return ()
    Just n -> do
      if | n `mod` 15 == 0 -> yield "FizzBuzz"
         | n `mod` 5  == 0 -> yield "Fizz"
         | n `mod` 3  == 0 -> yield "Buzz"
         | otherwise       -> return ()
      conduit

sink :: Sink String IO ()
sink = CL.mapM_ putStrLn

main :: IO ()
main = source $$ conduit =$ sink

See: Conduit Overview

Data Formats

JSON

Aeson is library for efficient parsing and generating JSON.

decode :: FromJSON a => ByteString -> Maybe a
encode :: ToJSON a => a -> ByteString
eitherDecode :: FromJSON a => ByteString -> Either String a

fromJSON :: FromJSON a => Value -> Result a
toJSON :: ToJSON a => a -> Value

We'll work with this contrived example:

{
    "id": 1,
    "name": "A green door",
    "price": 12.50,
    "tags": ["home", "green"],
    "refs": {
      "a": "red",
      "b": "blue"
    }
}

Aeson uses several high performance data structures (Vector, Text, HashMap) by default instead of the naive versions so typically using Aeson will require that us import them and use OverloadedStrings when indexing into objects.

type Object = HashMap Text Value

type Array = Vector Value

-- | A JSON value represented as a Haskell value.
data Value = Object !Object
           | Array !Array
           | String !Text
           | Number !Scientific
           | Bool !Bool
           | Null

See: Aeson Documentation

Unstructured

In dynamic scripting languages it's common to parse amorphous blobs of JSON without any a priori structure and then handle validation problems by throwing exceptions while traversing it. We can do the same using Aeson and the Maybe monad.

{-# LANGUAGE OverloadedStrings #-}

import Data.Text
import Data.Aeson
import Data.Vector
import qualified Data.HashMap.Strict as M
import qualified Data.ByteString.Lazy as BL

-- Pull a key out of an JSON object.
(^?) :: Value -> Text -> Maybe Value
(^?) (Object obj) k = M.lookup k obj
(^?) _ _ = Nothing

-- Pull the ith value out of a JSON list.
ix :: Value -> Int -> Maybe Value
ix (Array arr) i = arr !? i
ix _ _ = Nothing

readJSON str = do
  obj <- decode str
  price <- obj ^? "price"
  refs  <- obj ^? "refs"
  tags  <- obj ^? "tags"
  aref  <- refs ^? "a"
  tag1  <- tags `ix` 0
  return (price, aref, tag1)

main :: IO ()
main = do
  contents <- BL.readFile "example.json"
  print $ readJSON contents

Structured

This isn't ideal since we've just smeared all the validation logic across our traversal logic instead of separating concerns and handling validation in separate logic. We'd like to describe the structure before-hand and the invalid case separately. Using Generic also allows Haskell to automatically write the serializer and deserializer between our datatype and the JSON string based on the names of record field names.

{-# LANGUAGE DeriveGeneric #-}

import Data.Text
import Data.Aeson
import GHC.Generics
import qualified Data.ByteString.Lazy as BL

import Control.Applicative

data Refs = Refs
  { a :: String
  , b :: String
  } deriving (Show,Generic)

data Data = Data
  { id    :: Int
  , name  :: Text
  , price :: Int
  , tags  :: [String]
  , refs  :: Refs
  } deriving (Show,Generic)

instance FromJSON Data
instance FromJSON Refs
instance ToJSON Data
instance ToJSON Refs

main :: IO ()
main = do
  contents <- BL.readFile "example.json"
  let Just dat = decode contents
  print $ name dat
  print $ a (refs dat)

Now we get our validated JSON wrapped up into a nicely typed Haskell ADT.

Data
  { id = 1
  , name = "A green door"
  , price = 12
  , tags = [ "home" , "green" ]
  , refs = Refs { a = "red" , b = "blue" }
  }

The functions fromJSON and toJSON can be used to convert between this sum type and regular Haskell types with.

data Result a = Error String | Success a
λ: fromJSON (Bool True) :: Result Bool
Success True

λ: fromJSON (Bool True) :: Result Double
Error "when expecting a Double, encountered Boolean instead"

CSV

Cassava is an efficient CSV parser library. We'll work with this tiny snippet from the iris dataset:

sepal_length,sepal_width,petal_length,petal_width,plant_class
5.1,3.5,1.4,0.2,Iris-setosa
5.0,2.0,3.5,1.0,Iris-versicolor
6.3,3.3,6.0,2.5,Iris-virginica

Unstructured

Just like with Aeson if we really want to work with unstructured data the library accommodates this.

import Data.Csv

import Text.Show.Pretty

import qualified Data.Vector as V
import qualified Data.ByteString.Lazy as BL

type ErrorMsg = String
type CsvData = V.Vector (V.Vector BL.ByteString)

example :: FilePath -> IO (Either ErrorMsg CsvData)
example fname = do
  contents <- BL.readFile fname
  return $ decode NoHeader contents

We see we get the nested set of stringy vectors:

[ [ "sepal_length"
  , "sepal_width"
  , "petal_length"
  , "petal_width"
  , "plant_class"
  ]
, [ "5.1" , "3.5" , "1.4" , "0.2" , "Iris-setosa" ]
, [ "5.0" , "2.0" , "3.5" , "1.0" , "Iris-versicolor" ]
, [ "6.3" , "3.3" , "6.0" , "2.5" , "Iris-virginica" ]
]

Structured

Just like with Aeson we can use Generic to automatically write the deserializer between our CSV data and our custom datatype.

{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE DeriveGeneric #-}

import Data.Csv
import GHC.Generics
import qualified Data.Vector as V
import qualified Data.ByteString.Lazy as BL

data Plant = Plant
  { sepal_length :: Double
  , sepal_width  :: Double
  , petal_length :: Double
  , petal_width  :: Double
  , plant_class :: String
  } deriving (Generic, Show)

instance FromNamedRecord Plant
instance ToNamedRecord Plant

type ErrorMsg = String
type CsvData = (Header, V.Vector Plant)

parseCSV :: FilePath -> IO (Either ErrorMsg CsvData)
parseCSV fname = do
  contents <- BL.readFile fname
  return $ decodeByName contents

main = parseCSV "iris.csv" >>= print

And again we get a nice typed ADT as a result.

[ Plant
    { sepal_length = 5.1
    , sepal_width = 3.5
    , petal_length = 1.4
    , petal_width = 0.2
    , plant_class = "Iris-setosa"
    }
, Plant
    { sepal_length = 5.0
    , sepal_width = 2.0
    , petal_length = 3.5
    , petal_width = 1.0
    , plant_class = "Iris-versicolor"
    }
, Plant
    { sepal_length = 6.3
    , sepal_width = 3.3
    , petal_length = 6.0
    , petal_width = 2.5
    , plant_class = "Iris-virginica"
    }
]

Network & Web Programming

HTTP

{-# LANGUAGE OverloadedStrings #-}

import Network.HTTP.Types
import Network.HTTP.Client
import Control.Applicative
import Control.Concurrent.Async

type URL = String

get :: Manager -> URL -> IO Int
get m url = do
  req <- parseUrl url
  statusCode <$> responseStatus <$> httpNoBody req m

single :: IO Int
single = do
  withManager defaultManagerSettings $ \m -> do
    get m "http://haskell.org"

parallel :: IO [Int]
parallel = do
  withManager defaultManagerSettings $ \m -> do
    -- Fetch w3.org 10 times concurrently
    let urls = replicate 10 "http://www.w3.org"
    mapConcurrently (get m) urls

main :: IO ()
main = do
  print =<< single
  print =<< parallel

Warp

Warp is a web server, it writes data to sockets quickly.

{-# LANGUAGE OverloadedStrings #-}

import Network.Wai
import Network.Wai.Handler.Warp (run)
import Network.HTTP.Types

app :: Application
app req = return $ responseLBS status200 [] "Engage!"

main :: IO ()
main = run 8000 app

See: Warp

Scotty

Continuing with our trek through web libraries, Scotty is a web microframework similar in principle to Flask in Python or Sinatra in Ruby.

{-# LANGUAGE OverloadedStrings #-}

import Web.Scotty

import qualified Text.Blaze.Html5 as H
import Text.Blaze.Html5 (toHtml, Html)
import Text.Blaze.Html.Renderer.Text (renderHtml)

greet :: String -> Html
greet user = H.html $ do
  H.head $
    H.title "Welcome!"
  H.body $ do
    H.h1 "Greetings!"
    H.p ("Hello " >> toHtml user >> "!")

app = do
  get "/" $
    text "Home Page"

  get "/greet/:name" $ do
    name <- param "name"
    html $ renderHtml (greet name)

main :: IO ()
main = scotty 8000 app

Of importance to note is the Blaze library used here overloads do-notation but is not itself a proper monad.

See: Making a Website with Haskell

Databases

Acid State

Acid-state allows us to build a "database on demand" for arbitrary Haskell datatypes that guarantees atomic transactions. For example, we can build a simple key-value store wrapped around the Map type.

{-# LANGUAGE DeriveDataTypeable #-}
{-# LANGUAGE TemplateHaskell #-}

import Data.Acid
import Data.Typeable
import Data.SafeCopy
import Control.Monad.Reader (ask)

import qualified Data.Map as Map
import qualified Control.Monad.State as S

type Key = String
type Value = String

data Database = Database !(Map.Map Key Value)
    deriving (Show, Ord, Eq, Typeable)

$(deriveSafeCopy 0 'base ''Database)

insertKey :: Key -> Value -> Update Database ()
insertKey key value
    = do Database m <- S.get
         S.put (Database (Map.insert key value m))

lookupKey :: Key -> Query Database (Maybe Value)
lookupKey key
    = do Database m <- ask
         return (Map.lookup key m)

deleteKey :: Key -> Update Database ()
deleteKey key
    = do Database m <- S.get
         S.put (Database (Map.delete key m))

allKeys :: Int -> Query Database [(Key, Value)]
allKeys limit
    = do Database m <- ask
         return $ take limit (Map.toList m)

$(makeAcidic ''Database ['insertKey, 'lookupKey, 'allKeys, 'deleteKey])

fixtures :: Map.Map String String
fixtures = Map.empty

test ::  Key -> Value -> IO ()
test key val = do
    database <- openLocalStateFrom "db/" (Database fixtures)
    result <- update database (InsertKey key val)
    result <- query database (AllKeys 10)
    print result

Persistent

Esqueleto

GHC

Core

To inspect the core from GHCi we can invoke it using the following flags and the alias:

alias ghci-core="ghci -ddump-simpl -dsuppress-idinfo \
-dsuppress-coercions -dsuppress-type-applications \ 
-dsuppress-uniques -dsuppress-module-prefixes"

At the interactive prompt we can then explore the core representation interactively:

$ ghci-core
λ: let f x = x + 2 ; f :: Int -> Int

==================== Simplified expression ====================
returnIO
  (: ((\ (x :: Int) -> + $fNumInt x (I# 2)) `cast` ...) ([]))

λ: let f x = (x, x)

==================== Simplified expression ====================
returnIO (: ((\ (@ t) (x :: t) -> (x, x)) `cast` ...) ([]))

ghc-core is also very useful for looking at GHC's compilation artifacts.

$ ghc-core --no-cast --no-asm

Core from GHC is roughly human readable, but it's helpful to look at simple human written examples to get the hang of what's going on. Of important note is that the Λ and λ for type-level and value-level lambda abstraction are represented by the same symbol (\) in core, which is a simplifying detail of the GHC's implementation but a source of some confusion when starting.

id :: a -> a
id x = x
id :: forall a. a -> a
id = \ (@ a) (x :: a) -> x

idInt :: GHC.Types.Int -> GHC.Types.Int
idInt = id @ GHC.Types.Int
compose :: (b -> c) -> (a -> b) -> a -> c
compose f g x = f (g x)
compose :: forall b c a. (b -> c) -> (a -> b) -> a -> c
compose = \ (@ b) (@ c) (@ a) (f1 :: b -> c) (g :: a -> b) (x1 :: a) -> f1 (g x1)
map :: (a -> b) -> [a] -> [b]
map f []     = []
map f (x:xs) = f x : map f xs
map :: forall a b. (a -> b) -> [a] -> [b]
map =
  \ (@ a) (@ b) (f :: a -> b) (xs :: [a]) ->
    case xs of _ {
      []     -> [] @ b;
      : y ys -> : @ b (f y) (map @ a @ b f ys)
    }
x `seq` y
case x of _ { 
  __DEFAULT -> y 
}

One particularly notable case of the Core desugaring process is that pattern matching on overloaded numbers implicitly translates into equality test (i.e. Eq).

f 0 = 1
f 1 = 2
f 2 = 3
f 3 = 4
f 4 = 5
f _ = 0


f :: forall a b. (Eq a, Num a, Num b) => a -> b
f =
  \ (@ a)
    (@ b)
    ($dEq :: Eq a)
    ($dNum :: Num a)
    ($dNum1 :: Num b)
    (ds :: a) ->
    case == $dEq ds (fromInteger $dNum (__integer 0)) of _ {
      False ->
        case == $dEq ds (fromInteger $dNum (__integer 1)) of _ {
          False ->
            case == $dEq ds (fromInteger $dNum (__integer 2)) of _ {
              False ->
                case == $dEq ds (fromInteger $dNum (__integer 3)) of _ {
                  False ->
                    case == $dEq ds (fromInteger $dNum (__integer 4)) of _ {
                      False -> fromInteger $dNum1 (__integer 0);
                      True -> fromInteger $dNum1 (__integer 5)
                    };
                  True -> fromInteger $dNum1 (__integer 4)
                };
              True -> fromInteger $dNum1 (__integer 3)
            };
          True -> fromInteger $dNum1 (__integer 2)
        };
      True -> fromInteger $dNum1 (__integer 1)
    }

Of course, adding a concrete type signature changes the desugar just matching on the unboxed values.

f :: Int -> Int
f =
  \ (ds :: Int) ->
    case ds of _ { I# ds1 ->
    case ds1 of _ {
      __DEFAULT -> I# 0;
      0 -> I# 1;
      1 -> I# 2;
      2 -> I# 3;
      3 -> I# 4;
      4 -> I# 5
    }
    }

See:

Dictionaries

The Haskell language defines the notion of Typeclasses but is agnostic to how they are implemented in a Haskell compiler. GHC's particular implementation uses a pass called the dictionary passing translation part of the elaboration phase of the typechecker which translates Core functions with typeclass constraints into implicit parameters of which record-like structures containing the function implementations are passed.

class Num a where
  (+) :: a -> a -> a
  (*) :: a -> a -> a
  negate :: a -> a

This class can be thought as the implementation equivalent to the following parameterized record of functions.

data DNum a = DNum (a -> a -> a) (a -> a -> a) (a -> a)

add (DNum a m n) = a
mul (DNum a m n) = m
neg (DNum a m n) = n

numDInt :: DNum Int
numDInt = DNum plusInt timesInt negateInt

numDFloat :: DNum Float
numDFloat = DNum plusFloat timesFloat negateFloat
+ :: forall a. Num a => a -> a -> a
+ = \ (@ a) (tpl :: Num a) -> 
  case tpl of _ { D:Num tpl _ _ -> tpl }

* :: forall a. Num a => a -> a -> a
* = \ (@ a) (tpl :: Num a) -> 
  case tpl of _ { D:Num _ tpl _ -> tpl }

negate :: forall a. Num a => a -> a
negate = \ (@ a) (tpl :: Num a) -> 
  case tpl of _ { D:Num _ _ tpl -> tpl }
add :: forall t. NumD t -> t -> t
add = \ (@ t) (ds :: NumD t) ->
    case ds of _ { NumDict a m n -> n }

mul :: forall t. NumD t -> t -> t -> t
mul = \ (@ t) (ds :: NumD t) ->
    case ds of _ { NumDict a m n -> m }

neg :: forall t. NumD t -> t -> t
neg = \ (@ t) (ds :: NumD t) ->
    case ds of _ { NumDict a m n -> n }

There are generally two schools of thought on the use of typeclasses in high-level library design. The first is to favor value-level programming as the core of an internal API and use type-classes to provide sugar on top of the forward-facing interface or not at all. There are of course other schools of thought.

See: Scrap Your Type Classes

Unboxed Types

The usual integer type in Haskell can be considered to be a regular algebraic datatype with a special constructor.

λ: :set -XMagicHash
λ: :m +GHC.Types
λ: :m +GHC.Prim
λ: :i Int
data Int = I# Int#      -- Defined in GHC.Types
{-# LANGUAGE BangPatterns, MagicHash, UnboxedTuples #-}

import GHC.Exts
import GHC.Prim

ex1 :: Bool
ex1 = gtChar# a# b#
  where
    !(C# a#) = 'a'
    !(C# b#) = 'b'

ex2 :: Int
ex2 = I# (a# +# b#)
  where
    !(I# a#) = 1
    !(I# b#) = 2

ex3 :: Int
ex3 = (I# (1# +# 2# *# 3# +# 4#))

ex4 :: (Int, Int)
ex4 = (I# (dataToTag# False), I# (dataToTag# True))

The function for integer arithmetic used in the Num typeclass for Int is just pattern matching on this type to reveal the underlying unboxed value, performing the builtin arithmetic and then performing the packing up into Int again.

plusInt :: Int -> Int -> Int
(I# x) `plusInt`  (I# y) = I# (x +# y)

Where (+#) is a low level function built into GHC that maps to unboxed integer arithmetic directly.

plusInt :: Int -> Int -> Int
plusInt a b = case a of {
    (I# a_) -> case b of {
      (I# b_) -> I# (+# a_ b_);
    };
};

Since the Int type we'd write down for normal logic is itself boxed, we'd sometimes like to inform GHC that our value should is just a fixed unboxed value on the heap and to refer to it by value instead of by reference. In C the rewrite would be like the following:

struct A {
  int *a;
};

struct A {
  int a;
};

Effectively we'd like to be able to define our constructor to be stored as:

data A = A #Int

But maintain all our logic around as if it were written against Int, performing the boxing and unboxing where needed.

data A = A !Int

To do this there is the UNPACK pragma or -funbox-strict-fields to inform GHC to perform the rewrite we want.

data A = A {-# UNPACK #-} !Int

See:

Languages

Unbound

Several libraries exist to mechanize the process of writing name capture and substitution, since it is largely mechanical. Probably the most robust is the unbound library. For example we can implement the infer function for a small Hindley-Milner system over a simple typed lambda calculus without having to write the name capture and substitution mechanics ourselves.

{-# LANGUAGE TemplateHaskell #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE UndecidableInstances #-}
{-# LANGUAGE MultiParamTypeClasses #-}
{-# LANGUAGE OverloadedStrings #-}

module Infer where

import Data.String
import Data.Map (Map)
import Control.Monad.Error
import qualified Data.Map as Map

import qualified Unbound.LocallyNameless as NL
import Unbound.LocallyNameless hiding (Subst, compose)

data Type
  = TVar (Name Type)
  | TArr Type Type
  deriving (Show)

data Expr
  = Var (Name Expr)
  | Lam (Bind (Name Expr) Expr)
  | App Expr Expr
  | Let (Bind (Name Expr) Expr)
  deriving (Show)

$(derive [''Type, ''Expr])


instance IsString Expr where
    fromString = Var . fromString
instance IsString Type where
    fromString = TVar . fromString
instance IsString (Name Expr) where
    fromString = string2Name
instance IsString (Name Type) where
    fromString = string2Name

instance Eq Type where
    (==) = eqType

eqType :: Type -> Type -> Bool
eqType (TVar v1) (TVar v2) = v1 == v2
eqType _ _ = False

uvar :: String -> Expr
uvar x = Var (s2n x)

tvar :: String -> Type
tvar x = TVar (s2n x)

instance Alpha Type
instance Alpha Expr

instance NL.Subst Type Type where
  isvar (TVar v) = Just (SubstName v)
  isvar _ = Nothing

instance NL.Subst Expr Expr where
  isvar (Var v) = Just (SubstName v)
  isvar _ = Nothing

instance NL.Subst Expr Type where


data TypeError
  = UnboundVariable (Name Expr)
  | GenericTypeError
  deriving (Show)

instance Error TypeError where
  noMsg = GenericTypeError


type Env = Map (Name Expr) Type
type Constraint = (Type, Type)
type Infer = ErrorT TypeError FreshM

empty :: Env
empty = Map.empty

freshtv :: Infer Type
freshtv = do
  x <- fresh "_t"
  return $ TVar x

infer :: Env -> Expr -> Infer (Type, [Constraint])
infer env expr = case expr  of

  Lam b -> do
    (n,e) <- unbind b
    tv <- freshtv
    let env' = Map.insert n tv env
    (t, cs) <- infer env' e
    return (TArr tv t, cs)

  App e1 e2 -> do
     (t1, cs1) <- infer env e1
     (t2, cs2) <- infer env e2
     tv <- freshtv
     return (tv, (t1, TArr t2 tv) : cs1 ++ cs2)

  Var n -> do
     case Map.lookup n env of
        Nothing -> throwError $ UnboundVariable n
        Just t  -> return (t, [])

  Let b -> do
     (n, e) <- unbind b
     (tBody, csBody) <- infer env e
     let env' = Map.insert n tBody env
     (t, cs) <- infer env' e
     return (t, cs ++ csBody)

LLVM

LLVM is a library for generating machine code. The llvm-general bindings provide a way to model, compile and execute LLVM bytecode from within the Haskell runtime.

See:

Printer Combinators

Pretty printer combinators compose logic to print strings.

Combinators
<> Concatenation
<+> Spaced concatenation
char Renders a character as a Doc
text Renders a string as a Doc
{-# LANGUAGE FlexibleInstances #-}

import Text.PrettyPrint
import Text.Show.Pretty (ppShow)

parensIf ::  Bool -> Doc -> Doc
parensIf True = parens
parensIf False = id

type Name = String

data Expr
  = Var String
  | Lit Ground
  | App Expr Expr
  | Lam Name Expr
  deriving (Eq, Show)

data Ground
  = LInt Int
  | LBool Bool
  deriving (Show, Eq, Ord)


class Pretty p where
  ppr :: Int -> p -> Doc

instance Pretty String where
  ppr _ x = text x

instance Pretty Expr where
  ppr _ (Var x)         = text x
  ppr _ (Lit (LInt a))  = text (show a)
  ppr _ (Lit (LBool b)) = text (show b)

  ppr p e@(App _ _) =
    let (f, xs) = viewApp e in
    let args = sep $ map (ppr (p+1)) xs in
    parensIf (p>0) $ ppr p f <+> args

  ppr p e@(Lam _ _) =
    let body = ppr (p+1) (viewBody e) in
    let vars = map (ppr 0) (viewVars e) in
    parensIf (p>0) $ char '\\' <> hsep vars <+> text "." <+> body

viewVars :: Expr -> [Name]
viewVars (Lam n a) = n : viewVars a
viewVars _ = []

viewBody :: Expr -> Expr
viewBody (Lam _ a) = viewBody a
viewBody x = x

viewApp :: Expr -> (Expr, [Expr])
viewApp (App e1 e2) = go e1 [e2]
  where
    go (App a b) xs = go a (b : xs)
    go f xs = (f, xs)

ppexpr :: Expr -> String
ppexpr = render . ppr 0


s, k, example :: Expr
s = Lam "f" (Lam "g" (Lam "x" (App (Var "f") (App (Var "g") (Var "x")))))
k = Lam "x" (Lam "y" (Var "x"))
example = App s k

main :: IO ()
main = do
  putStrLn $ ppexpr s
  putStrLn $ ppShow example

The pretty printed form of the k combinator:

\f g x . (f (g x))

The Text.Show.Pretty library can be used to pretty print nested data structures in a more human readable form for any type that implements Show. For example a dump of the structure for the AST of SK combinator with ppShow.

App
  (Lam
     "f" (Lam "g" (Lam "x" (App (Var "f") (App (Var "g") (Var "x"))))))
  (Lam "x" (Lam "y" (Var "x")))

Adding the following to your ghci.conf can be useful for working with deeply nested structures interactively.

import Text.Show.Pretty (ppShow)
let pprint x = putStrLn $ ppShow x

See: The Design of a Pretty-printing Library

Haskeline

Haskeline is cross-platform readline support which plays nice with GHCi as well.

runInputT :: Settings IO -> InputT IO a -> IO a
getInputLine :: String -> InputT IO (Maybe String)
import Control.Monad.Trans
import System.Console.Haskeline

type Repl a = InputT IO a

process :: String -> IO ()
process = putStrLn

repl :: Repl ()
repl = do
  minput <- getInputLine "Repl> "
  case minput of
    Nothing -> outputStrLn "Goodbye."
    Just input -> (liftIO $ process input) >> repl

main :: IO ()
main = runInputT defaultSettings repl

Template Haskell

Quasiquotation

Quasiquotation allows us to express "quoted" blocks of syntax that need not necessarily be be the syntax of the host language, but unlike just writing a giant string it is instead parsed into some AST datatype in the host language. Notably values from the host languages can be injected into the custom language via user-definable logic allowing information to flow between the two languages.

In practice quasiquotation can be used to implement custom domain specific languages or integrate with other general languages entirely via code-generation.

We've already seen how to write a Parsec parser, now let's write a quasiquoter for it.

{-# LANGUAGE QuasiQuotes #-}
{-# LANGUAGE TemplateHaskell #-}

module Quasiquote where

import Language.Haskell.TH
import Language.Haskell.TH.Syntax
import Language.Haskell.TH.Quote

import Text.Parsec
import Text.Parsec.String (Parser)
import Text.Parsec.Language (emptyDef)

import qualified Text.Parsec.Expr as Ex
import qualified Text.Parsec.Token as Tok

import Control.Monad.Identity

data Expr
  = Tr
  | Fl
  | Zero
  | Succ Expr
  | Pred Expr
  deriving (Eq, Show)

instance Lift Expr where
  lift Tr         = [| Tr |]
  lift Fl         = [| Tr |]
  lift Zero       = [| Zero |]
  lift (Succ a)   = [| Succ a |]
  lift (Pred a)   = [| Pred a |]

type Op = Ex.Operator String () Identity

lexer :: Tok.TokenParser ()
lexer = Tok.makeTokenParser emptyDef

parens :: Parser a -> Parser a
parens = Tok.parens lexer

reserved :: String -> Parser ()
reserved = Tok.reserved lexer

semiSep :: Parser a -> Parser [a]
semiSep = Tok.semiSep lexer

reservedOp :: String -> Parser ()
reservedOp = Tok.reservedOp lexer

prefixOp :: String -> (a -> a) -> Op a
prefixOp x f = Ex.Prefix (reservedOp x >> return f)

table :: [[Op Expr]]
table = [
    [ prefixOp "succ" Succ
    , prefixOp "pred" Pred
    ]
  ]

expr :: Parser Expr
expr = Ex.buildExpressionParser table factor

true, false, zero :: Parser Expr
true  = reserved "true" >> return Tr
false = reserved "false" >> return Fl
zero  = reservedOp "0" >> return Zero

factor :: Parser Expr
factor =
      true
  <|> false
  <|> zero
  <|> parens expr

contents :: Parser a -> Parser a
contents p = do
  Tok.whiteSpace lexer
  r <- p
  eof
  return r

toplevel :: Parser [Expr]
toplevel = semiSep expr

parseExpr :: String -> Either ParseError Expr
parseExpr s = parse (contents expr) "<stdin>" s

parseToplevel :: String -> Either ParseError [Expr]
parseToplevel s = parse (contents toplevel) "<stdin>" s

calcExpr :: String -> Q Exp
calcExpr str = do
  filename <- loc_filename `fmap` location
  case parse (contents expr) filename str of
    Left err -> error (show err)
    Right tag -> [| tag |]

calc :: QuasiQuoter
calc = QuasiQuoter calcExpr err err err
  where err = error "Only defined for values"

Testing it out:

{-# LANGUAGE QuasiQuotes #-}

import Quasiquote

a :: Expr
a = [calc|true|]
-- Tr

b :: Expr
b = [calc|succ (succ 0)|]
-- Succ (Succ Zero)

c :: Expr
c = [calc|pred (succ 0)|]
-- Pred (Succ Zero)

One extremely important feature is the ability to preserve position information so that errors in the embedded language can be traced back to the line of the host syntax.

language-c-quote

Of course since we can provide an arbitrary parser for the quoted expression, one might consider embedding the AST of another language entirely. For example C or CUDA C.

hello :: String -> C.Func
hello msg = [cfun|

int main(int argc, const char *argv[])
{
    printf($msg);
    return 0;
}

|]

Evaluating this we get back an AST representation of the quoted C program which we can manipulate or print back out to textual C code using ppr function.

Func
  (DeclSpec [] [] (Tint Nothing))
  (Id "main")
  DeclRoot
  (Params
     [ Param (Just (Id "argc")) (DeclSpec [] [] (Tint Nothing)) DeclRoot
     , Param
         (Just (Id "argv"))
         (DeclSpec [] [ Tconst ] (Tchar Nothing))
         (Array [] NoArraySize (Ptr [] DeclRoot))
     ]
     False)
  [ BlockStm
      (Exp
         (Just
            (FnCall
               (Var (Id "printf"))
               [ Const (StringConst [ "\"Hello Haskell!\"" ] "Hello Haskell!")
               ])))
  , BlockStm (Return (Just (Const (IntConst "0" Signed 0))))
  ]

In this example we just spliced in the anti-quoted Haskell string in the printf statement, but we can pass many other values to and from the quoted expressions including identifiers, numbers, and other quoted expressions which implement the Lift type class.

For example now if we wanted programmatically generate the source for a CUDA kernel to run on a GPU we can switch over the CUDA C dialect to emit the C code.

{-# LANGUAGE QuasiQuotes #-}
{-# LANGUAGE TemplateHaskell #-}

import Text.PrettyPrint.Mainland
import qualified Language.C.Syntax as C
import qualified Language.C.Quote.CUDA as Cuda

cuda_fun :: String -> Int -> Float -> C.Func
cuda_fun fn n a = [Cuda.cfun|

__global__ void $id:fn (float *x, float *y) {
  int i = blockIdx.x*blockDim.x + threadIdx.x;
  if ( i<$n ) { y[i] = $a*x[i] + y[i]; }
}

|]

cuda_driver :: String -> Int -> C.Func
cuda_driver fn n = [Cuda.cfun|

void driver (float *x, float *y) {
  float *d_x, *d_y;

  cudaMalloc(&d_x, $n*sizeof(float));
  cudaMalloc(&d_y, $n*sizeof(float));

  cudaMemcpy(d_x, x, $n, cudaMemcpyHostToDevice);
  cudaMemcpy(d_y, y, $n, cudaMemcpyHostToDevice);

  $id:fn<<<($n+255)/256, 256>>>(d_x, d_y);

  cudaFree(d_x);
  cudaFree(d_y);
  return 0;
}

|]

makeKernel :: String -> Float -> Int -> [C.Func]
makeKernel fn a n = [
    cuda_fun fn n a
  , cuda_driver fn n
  ]

main :: IO ()
main = do
  let ker = makeKernel "saxpy" 2 65536
  mapM_ (print . ppr) ker

Running this we generate:

__global__ void saxpy(float* x, float* y)
{
    int i = blockIdx.x * blockDim.x + threadIdx.x;

    if (i < 65536) {
        y[i] = 2.0 * x[i] + y[i];
    }
}
int driver(float* x, float* y)
{
    float* d_x, * d_y;

    cudaMalloc(&d_x, 65536 * sizeof(float));
    cudaMalloc(&d_y, 65536 * sizeof(float));
    cudaMemcpy(d_x, x, 65536, cudaMemcpyHostToDevice);
    cudaMemcpy(d_y, y, 65536, cudaMemcpyHostToDevice);
    saxpy<<<(65536 + 255) / 256, 256>>>(d_x, d_y);
    return 0;
}

Run the resulting output through nvcc -ptx -c to get the PTX associated with the outputted code.

Template Haskell

Of course the most useful case of quasiquotation is the ability to procedurally generate Haskell code itself from inside of Haskell. The template-haskell framework provides four entry points for the quotation to generate various types of Haskell declarations and expressions.

Type Quasiquoted Class
Q Exp [e| ... |] expression
Q Pat [p| ... |] pattern
Q Type [t| ... |] type
Q [Dec] [d| ... |] declaration
data QuasiQuoter = QuasiQuoter 
  { quoteExp  :: String -> Q Exp
  , quotePat  :: String -> Q Pat
  , quoteType :: String -> Q Type
  , quoteDec  :: String -> Q [Dec]
  }

The logic evaluating, splicing, and introspecting compile-time values is embedded within the Q monad, which has a runQ which can be used to evaluate it's context. These functions of this monad is deeply embedded in the implementation of GHC.

runQ :: Quasi m => Q a -> m a
runIO :: IO a -> Q a

Just as before, TemplateHaskell provides the ability to lift Haskell values into the their AST quantities within the quoted expression using the Lift type class.

class Lift t where
  lift :: t -> Q Exp

instance Lift Integer where
  lift x = return (LitE (IntegerL x))

instance Lift Int where
  lift x= return (LitE (IntegerL (fromIntegral x)))

instance Lift Char where
  lift x = return (LitE (CharL x))

instance Lift Bool where
  lift True  = return (ConE trueName)
  lift False = return (ConE falseName)

instance Lift a => Lift (Maybe a) where
  lift Nothing  = return (ConE nothingName)
  lift (Just x) = liftM (ConE justName `AppE`) (lift x)

instance Lift a => Lift [a] where
  lift xs = do { xs' <- mapM lift xs; return (ListE xs') }

In many cases Template Haskell can be used interactively to explore the AST form of various Haskell syntax.

λ: runQ [e| \x -> x |]
LamE [VarP x_2] (VarE x_2)

λ: runQ [d| data Nat = Z | S Nat |]
[DataD [] Nat_0 [] [NormalC Z_2 [],NormalC S_1 [(NotStrict,ConT Nat_0)]] []]

λ: runQ [p| S (S Z)|]
ConP Singleton.S [ConP Singleton.S [ConP Singleton.Z []]]

λ: runQ [t| Int -> [Int] |]
AppT (AppT ArrowT (ConT GHC.Types.Int)) (AppT ListT (ConT GHC.Types.Int))

λ: let g = $(runQ [| \x -> x |])

λ: g 3
3

Using Language.Haskell.TH we can piece together Haskell AST element by element but subject to our own custom logic to generate the code. This can be somewhat painful though as the source-language (called HsSyn) to Haskell is enormous, consisting of around 100 nodes in it's AST many of which are dependent on the state of language pragmas.

-- builds the function (f = \(a,b) -> a)
f :: Q [Dec]
f = do
  let f = mkName "f"
  a <- newName "a"
  b <- newName "b"
  return [ FunD f [ Clause [TupP [VarP a, VarP b]] (NormalB (VarE a)) [] ] ]
my_id :: a -> a
my_id x = $( [| x |] )

main = print (my_id "Hello Haskell!")

As a debugging tool it is useful to be able to dump the reified information out for a given symbol interactively, to do so there is a simple little hack.

{-# LANGUAGE QuasiQuotes #-}
{-# LANGUAGE TemplateHaskell #-}

import Text.Show.Pretty (ppShow)
import Language.Haskell.TH

introspect :: Name -> Q Exp
introspect n = do
  t <- reify n
  runIO $ putStrLn $ ppShow t
  [| return () |]
λ: $(introspect 'id)
VarI
  GHC.Base.id
  (ForallT
     [ PlainTV a_1627405383 ]
     []
     (AppT (AppT ArrowT (VarT a_1627405383)) (VarT a_1627405383)))
  Nothing
  (Fixity 9 InfixL)


λ: $(introspect ''Maybe)
TyConI
  (DataD
     []
     Data.Maybe.Maybe
     [ PlainTV a_1627399528 ]
     [ NormalC Data.Maybe.Nothing []
     , NormalC Data.Maybe.Just [ ( NotStrict , VarT a_1627399528 ) ]
     ]
     [])
import Language.Haskell.TH

foo :: Int -> Int
foo x = x + 1

data Bar

fooInfo :: InfoQ
fooInfo = reify 'foo

barInfo :: InfoQ
barInfo = reify ''Bar
$( [d| data T = T1 | T2 |] )

main = print [T1, T2]

Splices are indicated by $(f) syntax for the expression level and at the toplevel simply by invocation of the template Haskell function. Running GHC with -ddump-splices shows our code being spliced in at the specific location in the AST at compile-time.

$(f)

template_haskell_show.hs:1:1: Splicing declarations
    f
  ======>
    template_haskell_show.hs:8:3-10
    f (a_a5bd, b_a5be) = a_a5bd
{-# LANGUAGE QuasiQuotes #-}
{-# LANGUAGE TemplateHaskell #-}

module Splice where

import Language.Haskell.TH
import Language.Haskell.TH.Syntax

spliceF :: Q [Dec]
spliceF = do
  let f = mkName "f"
  a <- newName "a"
  b <- newName "b"
  return [ FunD f [ Clause [VarP a, VarP b] (NormalB (VarE a)) [] ] ]

spliceG :: Lift a => a -> Q [Dec]
spliceG n = runQ [d| g a = n |]
{-# LANGUAGE TemplateHaskell #-}

import Splice

spliceF
spliceG "argument"

main = do
  print $ f 1 2
  print $ g ()

At the point of the splice all variables and types used must be in scope, so it must appear after their declarations in the module. As a result we often have to mentally topologically sort our code when using TemplateHaskell such that declarations are defined in order.

See: Template Haskell AST

Antiquotation

Extending our quasiquotation from above now that we have TemplateHaskell machinery we can implement the same class of logic that it uses to pass Haskell values in and pull Haskell values out via pattern matching on templated expressions.

{-# LANGUAGE QuasiQuotes #-}
{-# LANGUAGE TemplateHaskell #-}
{-# LANGUAGE DeriveDataTypeable #-}

module Antiquote where

import Data.Generics
import Language.Haskell.TH
import Language.Haskell.TH.Quote

import Text.Parsec
import Text.Parsec.String (Parser)
import Text.Parsec.Language (emptyDef)

import qualified Text.Parsec.Expr as Ex
import qualified Text.Parsec.Token as Tok

data Expr
  = Tr
  | Fl
  | Zero
  | Succ Expr
  | Pred Expr
  | Antiquote String
  deriving (Eq, Show, Data, Typeable)

lexer :: Tok.TokenParser ()
lexer = Tok.makeTokenParser emptyDef

parens :: Parser a -> Parser a
parens = Tok.parens lexer

reserved :: String -> Parser ()
reserved = Tok.reserved lexer

identifier :: Parser String
identifier = Tok.identifier lexer

semiSep :: Parser a -> Parser [a]
semiSep = Tok.semiSep lexer

reservedOp :: String -> Parser ()
reservedOp = Tok.reservedOp lexer

oper s f assoc = Ex.Prefix (reservedOp s >> return f)

table = [ oper "succ" Succ Ex.AssocLeft
        , oper "pred" Pred Ex.AssocLeft
        ]

expr :: Parser Expr
expr = Ex.buildExpressionParser [table] factor

true, false, zero :: Parser Expr
true  = reserved "true" >> return Tr
false = reserved "false" >> return Fl
zero  = reservedOp "0" >> return Zero

antiquote :: Parser Expr
antiquote = do
  char '$'
  var <- identifier
  return $ Antiquote var

factor :: Parser Expr
factor = true
      <|> false
      <|> zero
      <|> antiquote
      <|> parens expr

contents :: Parser a -> Parser a
contents p = do
  Tok.whiteSpace lexer
  r <- p
  eof
  return r

parseExpr :: String -> Either ParseError Expr
parseExpr s = parse (contents expr) "<stdin>" s


class Expressible a where
  express :: a -> Expr

instance Expressible Expr where
  express = id

instance Expressible Bool where
  express True = Tr
  express False = Fl

instance Expressible Integer where
  express 0 = Zero
  express n = Succ (express (n - 1))


exprE :: String -> Q Exp
exprE s = do
  filename <- loc_filename `fmap` location
  case parse (contents expr) filename s of
    Left err -> error (show err)
    Right exp -> dataToExpQ (const Nothing `extQ` antiExpr) exp

exprP :: String -> Q Pat
exprP s = do
  filename <- loc_filename `fmap` location
  case parse (contents expr) filename s of
    Left err -> error (show err)
    Right exp -> dataToPatQ (const Nothing `extQ` antiExprPat) exp

-- antiquote RHS
antiExpr :: Expr -> Maybe (Q Exp)
antiExpr (Antiquote v) = Just embed
  where embed = [| express $(varE (mkName v)) |]
antiExpr _ = Nothing

-- antiquote LHS
antiExprPat :: Expr -> Maybe (Q Pat)
antiExprPat (Antiquote v) = Just $ varP (mkName v)
antiExprPat _ = Nothing

mini :: QuasiQuoter
mini = QuasiQuoter exprE exprP undefined undefined
{-# LANGUAGE QuasiQuotes #-}

import Antiquote

-- extract
a :: Expr -> Expr
a [mini|succ $x|] = x

b :: Expr -> Expr
b [mini|succ $x|] = [mini|pred $x|]

c :: Expressible a => a -> Expr
c x = [mini|succ $x|]

d :: Expr
d = c (8 :: Integer)
-- Succ (Succ (Succ (Succ (Succ (Succ (Succ (Succ Zero)))))))

e :: Expr
e = c True
-- Succ Tr

Templated Type Families

Just like at the value-level we can construct type-level constructions by piecing together their AST.

Type          AST
----------    ----------
t1 -> t2      ArrowT `AppT` t2 `AppT` t2
[t]           ListT `AppT` t
(t1,t2)       TupleT 2 `AppT` t1 `AppT` t2

For example consider that type-level arithmetic is still somewhat incomplete in GHC 7.6, but there often cases where the span of typelevel numbers is not full set of integers but is instead some bounded set of numbers. We can instead define operations with a type-family instead of using an inductive definition ( which often requires manual proofs ) and simply enumerates the entire domain of arguments to the type-family and maps them to some result computed at compile-time.

For example the modulus operator would be non-trivial to implement at type-level but instead we can use the enumFamily function to splice in type-family which simply enumerates all possible pairs of numbers up to a desired depth.

module EnumFamily where
import Language.Haskell.TH

enumFamily :: (Integer -> Integer -> Integer)
           -> Name
           -> Integer
           -> Q [Dec]
enumFamily f bop upper = return decls
  where
    decls = do
      i <- [1..upper]
      j <- [2..upper]
      return $ TySynInstD bop (rhs i j)

    rhs i j = TySynEqn
      [LitT (NumTyLit i), LitT (NumTyLit j)]
      (LitT (NumTyLit (i `f` j)))
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE TemplateHaskell #-}

import EnumFamily

import Data.Proxy
import GHC.TypeLits

type family Mod (m :: Nat) (n :: Nat) :: Nat
type family Add (m :: Nat) (n :: Nat) :: Nat
type family Pow (m :: Nat) (n :: Nat) :: Nat

enumFamily mod ''Mod 10
enumFamily (+) ''Add 10
enumFamily (^) ''Pow 10

a :: Integer
a = natVal (Proxy :: Proxy (Mod 6 4))
-- 2

b :: Integer
b = natVal (Proxy :: Proxy (Pow 3 (Mod 6 4)))
-- 9

--    enumFamily mod ''Mod 3
--  ======>
--    template_typelevel_splice.hs:7:1-14
--    type instance Mod 2 1 = 0
--    type instance Mod 2 2 = 0
--    type instance Mod 2 3 = 2
--    type instance Mod 3 1 = 0
--    type instance Mod 3 2 = 1
--    type instance Mod 3 3 = 0
--    ...

In practice GHC seems fine with enormous type-family declarations although compile-time may increase a bit as a result.

The singletons library also provides a way to automate this process by letting us write seemingly value-level declarations inside of a quasiquoter and then promoting the logic to the type-level. For example if we wanted to write a value-level and type-level map function for our HList this would normally involve quite a bit of boilerplate, now it can stated very concisely.

{-# LANGUAGE GADTs #-}
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE PolyKinds #-}
{-# LANGUAGE QuasiQuotes #-}
{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE TemplateHaskell #-}
{-# LANGUAGE KindSignatures #-}
{-# LANGUAGE TypeOperators #-}
{-# LANGUAGE StandaloneDeriving #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE TypeSynonymInstances #-}

import Data.Singletons
import Data.Singletons.TH

$(promote [d|
  map :: (a -> b) -> [a] -> [b]
  map _ [] = []
  map f (x:xs) = f x : map f xs
  |])

infixr 5 :::

data HList (ts :: [ * ]) where
  Nil :: HList '[]
  (:::) :: t -> HList ts -> HList (t ': ts)

-- TypeLevel
-- MapJust :: [*] -> [Maybe *]
type MapJust xs = Map Maybe xs

-- Value Level
-- mapJust :: [a] -> [Maybe a]
mapJust :: HList xs -> HList (MapJust xs)
mapJust Nil = Nil
mapJust (x ::: xs) = (Just x) ::: mapJust xs

type A = [Bool, String , Double , ()]

a :: HList A
a = True ::: "foo" ::: 3.14 ::: () ::: Nil


example1 :: HList (MapJust A)
example1 = mapJust a

-- example1 reduces to example2 when expanded
example2 :: HList ([Maybe Bool, Maybe String , Maybe Double , Maybe ()])
example2 = Just True ::: Just "foo" ::: Just 3.14 ::: Just () ::: Nil

Templated Type Classes

Probably the most common use of Template Haskell is the automatic generation of type-class instances. Consider if we wanted to write a simple Pretty printing class for a flat data structure that derived the ppr method in terms of the names of the constructors in the AST we could write a simple instance.

{-# LANGUAGE QuasiQuotes #-}
{-# LANGUAGE TemplateHaskell #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE FlexibleContexts #-}

module Class where

import Language.Haskell.TH

class Pretty a where
  ppr :: a -> String

normalCons :: Con -> Name
normalCons (NormalC n _) = n

getCons :: Info -> [Name]
getCons cons = case cons of
    TyConI (DataD    _ _ _ tcons _) -> map normalCons tcons
    con -> error $ "Can't derive for:" ++ (show con)

pretty :: Name -> Q [Dec]
pretty dt = do
  info <- reify dt
  Just cls <- lookupTypeName "Pretty"
  let datatypeStr = nameBase dt
  let cons = getCons info
  let dtype = mkName (datatypeStr)
  let mkInstance xs =
        InstanceD
        []                              -- Context
        (AppT
          (ConT cls)                    -- Instance
          (ConT dtype))                 -- Head
        [(FunD (mkName "ppr") xs)]      -- Methods
  let methods = map cases cons
  return $ [mkInstance methods]

-- Pattern matches on the ``ppr`` method
cases :: Name -> Clause
cases a = Clause [ConP a []] (NormalB (LitE (StringL (nameBase a)))) []

In a separate file invoke the pretty instance at the toplevel, and with --ddump-splice if we want to view the spliced class instance.

{-# LANGUAGE QuasiQuotes #-}
{-# LANGUAGE TemplateHaskell #-}

import Class

data PlatonicSolid
  = Tetrahedron
  | Cube
  | Octahedron
  | Dodecahedron
  | Icosahedron

pretty ''PlatonicSolid

main :: IO ()
main = do
  putStrLn (ppr Octahedron)
  putStrLn (ppr Dodecahedron)

Templated Singletons

In the previous discussion about singletons, we introduced quite a bit of boilerplate code to work with the singletons. This can be partially abated by using Template Haskell to mechanically generate the instances and classes.

{-# LANGUAGE GADTs #-}
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE KindSignatures #-}
{-# LANGUAGE QuasiQuotes #-}
{-# LANGUAGE TemplateHaskell #-}

module Singleton where

import Text.Read
import Language.Haskell.TH
import Language.Haskell.TH.Quote

data Nat = Z | S Nat

data SNat :: Nat -> * where
  SZero :: SNat Z
  SSucc :: SNat n -> SNat (S n)

-- Quasiquoter for Singletons

sval :: String -> Q Exp
sval str = do
  case readEither str of
    Left err -> fail (show err)
    Right n -> do
      Just suc <- lookupValueName "SSucc"
      Just zer <- lookupValueName "SZero"
      return $ foldr AppE (ConE zer) (replicate n (ConE suc))

stype :: String -> Q Type
stype str = do
  case readEither str of
    Left err -> fail (show err)
    Right n -> do
      Just scon <- lookupTypeName "SNat"
      Just suc <- lookupValueName "S"
      Just zer <- lookupValueName "Z"
      let nat = foldr AppT (PromotedT zer) (replicate n (PromotedT suc))
      return $ AppT (ConT scon) nat

spat :: String -> Q Pat
spat str = do
  case readEither str of
    Left err -> fail (show err)
    Right n -> do
      Just suc <- lookupValueName "SSucc"
      Just zer <- lookupValueName "SZero"
      return $ foldr (\x y -> ConP x [y]) (ConP zer []) (replicate n (suc))

sdecl :: String -> a
sdecl _ = error "Cannot make toplevel declaration for snat."

snat :: QuasiQuoter
snat = QuasiQuoter sval spat stype sdecl

Trying it out by splicing code at the expression level, type level and as patterns.

{-# LANGUAGE GADTs #-}
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE QuasiQuotes #-}
{-# LANGUAGE TemplateHaskell #-}

import Singleton

zero :: [snat|0|]
zero =  [snat|0|]

one :: [snat|1|]
one =  [snat|1|]

two :: [snat|2|]
two =  [snat|2|]

three :: [snat|3|]
three  = [snat|3|]

test :: SNat a -> Int
test x = case x of
  [snat|0|] -> 0
  [snat|1|] -> 1
  [snat|2|] -> 2
  [snat|3|] -> 3

isZero :: SNat a -> Bool
isZero [snat|0|] = True
isZero _ = False

The singletons package takes this idea to it's logical conclusion allow us to toplevel declarations of seemingly regular Haskell syntax with singletons spliced in, the end result resembles the constructions in a dependently typed language if one squints hard enough.

{-# LANGUAGE GADTs #-}
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE PolyKinds #-}
{-# LANGUAGE QuasiQuotes #-}
{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE TemplateHaskell #-}
{-# LANGUAGE KindSignatures #-}
{-# LANGUAGE TypeOperators #-}

import Data.Singletons
import Data.Singletons.TH

$(singletons [d|
  data Nat = Zero | Succ Nat
    deriving (Eq, Show)

  plus :: Nat -> Nat -> Nat
  plus Zero     n = n
  plus (Succ m) n = Succ (plus m n)

  isEven :: Nat -> Bool
  isEven Zero = True
  isEven (Succ Zero) = False
  isEven (Succ (Succ n)) = isEven n
  |])

After template splicing we see that we now that several new constructs in scope:

type SNat a = Sing Nat a

type family IsEven a :: Bool
type family Plus a b :: Nat

sIsEven :: Sing Nat t0 -> Sing Bool (IsEven t0)
splus   :: Sing Nat a -> Sing Nat b -> Sing Nat (Plus a b)

Lenses

There are two implementations of note that are mostly compatible but differ in scope:

  • lens - The kitchen sink library with a wide variety of instances for many common libraries.
  • lens-family-core - The core abstractions in a standalone library with minimal dependencies.

van Laarhoven Lenses

At it's core a lens is a form of coupled getter and setter functions as a value under an existential functor.

--         +---- a : Type of structure
--         | +-- b : Type of target
--         | |
type Lens' a b = forall f. Functor f => (b -> f b) -> (a -> f a)

There are two derivations of van Laarhoven lenses, one that allows polymorphic update and one that is strictly monomorphic. Let's just consider the monomorphic variant first:

type Lens' a b = forall f. Functor f => (b -> f b) -> (a -> f a)

newtype Const x a  = Const { runConst :: x } deriving Functor
newtype Identity a = Identity { runIdentity :: a } deriving Functor

lens :: (s -> a) -> (s -> a -> s) -> Lens' s a
lens getter setter l b = setter b <$> l (getter b)

set :: Lens' a b -> b -> a -> a
set l b = runIdentity . l (const (Identity b))

get :: Lens' a b -> a -> b
get l = runConst . l Const

over :: Lens' a b -> (b -> b) -> a -> a
over l f a = set l (f (get l a)) a
infixl 1 &
infixr 4 .~
infixr 4 %~
infixr 8 ^.

(&) :: a -> (a -> b) -> b
(&) = flip ($)

(^.) = flip get
(.~) = set
(%~) = over

Such that we have:

s ^. (lens getter setter)       -- getter s
s  & (lens getter setter) .~ b  -- setter s b

Law 1

get l (set l b a) = b

Law 2

set l (view l a) a = a

Law 3

set l b1 (set l b2 a) = set l b1 a

With composition identities:

x^.a.b ≡ x^.a^.b
a.b %~ f ≡ a %~ b %~ f

x ^. id ≡ x
id %~ f ≡ f

While this may look like a somewhat convoluted way of reinventing record update, consider the types of these functions align very nicely such Lens themselves compose using the normal (.) composition, although in the reverse direction of function composition.

f     :: a -> b
g     :: b -> c
g . f :: a -> c

f     :: Lens a b  ~  (b -> f b) -> (a -> f a)
g     :: Lens b c  ~  (c -> f c) -> (b -> f b)
f . g :: Lens a c  ~  (c -> f c) -> (a -> f a)
{-# LANGUAGE RankNTypes #-}
{-# LANGUAGE DeriveFunctor #-}
{-# LANGUAGE NoMonomorphismRestriction #-}

import Data.Functor

type Lens' a b = forall f. Functor f => (b -> f b) -> (a -> f a)

newtype Const x a  = Const { runConst :: x } deriving Functor
newtype Identity a = Identity { runIdentity :: a } deriving Functor

lens :: (s -> a) -> (s -> a -> s) -> Lens' s a
lens getter setter f a = fmap (setter a) (f (getter a))

set :: Lens' a b -> b -> a -> a
set l b = runIdentity . l (const (Identity b))

view :: Lens' a b -> a -> b
view l = runConst . l Const

over :: Lens' a b -> (b -> b) -> a -> a
over l f a = set l (f (view l a)) a

compose :: Lens' a b -> Lens' b c -> Lens' a c
compose l s = l . s

id' :: Lens' a a
id' = id

infixl 1 &
infixr 4 .~
infixr 4 %~
infixr 8 ^.

(^.) = flip view
(.~) = set
(%~) = over

(&) :: a -> (a -> b) -> b
(&) = flip ($)

(+~), (-~), (*~) :: Num b => Lens' a b -> b -> a -> a
f +~ b = f %~ (+b)
f -~ b = f %~ (subtract b)
f *~ b = f %~ (*b)

-- Usage

data Foo = Foo { _a :: Int } deriving Show
data Bar = Bar { _b :: Foo } deriving Show

a :: Lens' Foo Int
a = lens getter setter
  where
    getter :: Foo -> Int
    getter = _a

    setter :: Foo -> Int -> Foo
    setter = (\f new -> f { _a = new })


b :: Lens' Bar Foo
b = lens getter setter
  where
    getter :: Bar -> Foo
    getter = _b

    setter :: Bar -> Foo -> Bar
    setter = (\f new -> f { _b = new })

foo :: Foo
foo = Foo 3

bar :: Bar
bar = Bar foo

example1 = view a foo
example2 = set a 1 foo
example3 = over a (+1) foo
example4 = view (b `compose` a) bar

example1' = foo  ^. a
example2' = foo  &  a .~ 1
example3' = foo  &  a %~ (+1)
example4' = bar  ^. b . a

It turns out that these simple ideas lead to a very rich set of composite combinators that be used to perform a wide for working with substructure of complex data structures.

Combinator Description
view View a single target or fold the targets of a monoidal quantity.
set Replace target with a value and return updated structure.
over Update targets with a function and return updated structure.
to Construct a retrieval function from an arbitrary Haskell function.
traverse Map each element of a structure to an action and collect results.
ix Target the given index of a generic indexable structure.
toListOf Return a list of the targets.
firstOf Returns Just the target of a prism or Nothing.

Certain patterns show up so frequently that they warrant their own operators, although they can be expressed textual terms as well.

Symbolic Textual Equivalent Description
^. view Access value of target
.~ set Replace target x
%~ over Apply function to target
+~ over t (+n) Add to target
-~ over t (-n) Subtract to target
*~ over t (*n) Multiply to target
//~ over t (//n) Divide to target
^~ over t (^n) Integral power to target
^^~ over t (^^n) Fractional power to target
||~ over t (|| p) Logical or to target
&&~ over t (&& p) Logical and to target
<>~ over t (<> n) Append to a monoidal target
?~ set t (Just x) Replace target with Just x
^? firstOf Return Just target or Nothing
^.. toListOf View list of targets

Constructing the lens field types from an arbitrary datatype involves a bit of boilerplate code generation. But compiles into simple calls which translate the fields of a record into functions involving the lens function and logic for the getter and the setter.

import Control.Lens

data Foo = Foo { _field :: Int }

field :: Lens' Foo Int
field = lens getter setter
  where
    getter :: Foo -> Int
    getter = _field

    setter :: Foo -> Int -> Foo
    setter = (\f new -> f { _field = new })

These are pure boilerplate, and Template Haskell can automatically generate these functions using makeLenses by introspecting the AST at compile-time.

{-# LANGUAGE TemplateHaskell #-}

import Control.Lens

data Foo = Foo { _field :: Int } deriving Show
makeLenses ''Foo

The simplest usage of lens is simply as a more compositional way of dealing with record access and updates, shown below in comparison with traditional record syntax:

{-# LANGUAGE TemplateHaskell #-}

import Control.Lens

data Rec = MkRec { _foo :: Int , _bar :: Int } deriving Show
makeLenses ''Rec

x :: Rec
x = MkRec { _foo = 1024, _bar = 1024 }

get1 :: Int
get1 = (_foo x) + (_bar x)

get2 :: Int
get2 = (x ^. foo) + (x ^. bar)

get3 :: Int
get3 = (view foo x) + (view bar x)


set1 :: Rec
set1 = x { _foo = 1, _bar = 2 }

set2 :: Rec
set2 = x & (foo .~ 1) . (bar .~ 2)

set3 :: Rec
set3 = x & (set foo 1) . (set bar 2)

Of course this just scratches the surface of lens, the real strength comes when dealing with complex and deeply nested structures:

{-# LANGUAGE TemplateHaskell #-}
{-# LANGUAGE NoMonomorphismRestriction #-}
{-# LANGUAGE RankNTypes #-}


import Control.Lens
import Control.Lens.TH

data Record1 = Record1
  { _a :: Int
  , _b :: Maybe Record2
  } deriving Show

data Record2 = Record2
  { _c :: String
  , _d :: [Int]
  } deriving Show

makeLenses ''Record1
makeLenses ''Record2

records :: [Record1]
records = [
    Record1 {
      _a = 1,
      _b = Nothing
    },
    Record1 {
      _a = 2,
      _b = Just $ Record2 {
        _c = "Picard",
        _d = [1,2,3]
      }
    },
    Record1 {
      _a = 3,
      _b = Just $ Record2 {
        _c = "Riker",
        _d = [4,5,6]
      }
    },
    Record1 {
      _a = 4,
      _b = Just $ Record2 {
        _c = "Data",
        _d = [7,8,9]
      }
    }
  ]

-- Lens targets
ids     = traverse.a
names   = traverse.b._Just.c
nums    = traverse.b._Just.d
listn n = traverse.b._Just.d.ix n

-- Modify to set all 'id' fields to 0
ex1 :: [Record1]
ex1 = set ids 0 records

-- Return a view of the concatenated 'd' fields for all nested records.
ex2 :: [Int]
ex2 = view nums records
-- [1,2,3,4,5,6,7,8,9]

-- Increment all 'id' fields by 1
ex3 :: [Record1]
ex3 = over ids (+1) records

-- Return a list of all 'c' fields.
ex4 :: [String]
ex4 = toListOf names records
-- ["Picard","Riker","Data"]

-- Return the the second element of all 'd' fields.
ex5 :: [Int]
ex5 = toListOf (listn 2) records
-- [3,6,9]

Lens also provides us with an optional dense slurry of operators that expand into combinations of the core combinators. Many of the operators do have a consistent naming scheme.

The sheer number of operators provided by lens is a polarizing for some, but all of the operators can be written in terms of the textual functions (set, view, over, at, ...) and some people prefer to use these instead.

Surprisingly lenses can be used as a very general foundation to write logic over a wide variety of data structures and computations and subsume many of the existing patterns found in the Prelude under a new common framework.

{-# LANGUAGE NoMonomorphismRestriction #-}

import Control.Lens
import Numeric.Lens
import Data.Complex.Lens

import Data.Complex
import qualified Data.Map as Map

l :: Num a => a
l = view _1 (100, 200)
-- 100

m :: Num a => (a, a, a)
m = (100,200,200) & _3 %~ (+100)
-- (100,200,300)

n :: Num a => [a]
n = [100,200,300] & traverse +~ 1
-- [101,201,301]

o :: Char
o = "frodo" ^?! ix 3
-- 'd'

p :: Num a => [a]
p = [[1,2,3], [4,5,6]] ^. traverse
-- [1,2,3,4,5,6]

q :: Num a => [a]
q = [1,2,3,4,5] ^. _tail
-- [2,3,4,5]

r :: Num a => [Maybe a]
r = [Just 1, Just 2, Just 3] & traverse._Just +~ 1
-- [Just 2, Just 3, Just 4]

s :: Maybe String
s = Map.fromList [("foo", "bar")] ^. at "foo"
-- Just "bar"

t :: Integral a => Maybe a
t = "1010110" ^? binary
-- Just 86

u :: Complex Float
u = (mkPolar 1 pi/2) & _phase +~ pi
-- 0.5 :+ 8.742278e-8

v :: [Integer]
v = [1..10] ^.. folded.filtered even
-- [2,4,6,8,10]

w :: [Integer]
w = [1, 2, 3, 4] & each . filtered even *~ 10
-- [1, 20, 3, 40]

x :: Num a => Maybe a
x = Left 3 ^? _Left
-- Just 3

See:

lens-family

The interface for lens-family is very similar to lens but with a smaller API and core.

{-# LANGUAGE TemplateHaskell #-}
{-# LANGUAGE NoMonomorphismRestriction #-}

import Lens.Family
import Lens.Family.TH
import Lens.Family.Stock
import Data.Traversable

data Record1 = Record1
  { _a :: Int
  , _b :: Maybe Record2
  } deriving Show

data Record2 = Record2
  { _c :: String
  , _d :: [Int]
  } deriving Show

mkLenses ''Record1
mkLenses ''Record2

records :: [Record1]
records = [
    Record1 {
      _a = 1,
      _b = Nothing
    },
    Record1 {
      _a = 2,
      _b = Just $ Record2 {
        _c = "Picard",
        _d = [1,2,3]
      }
    },
    Record1 {
      _a = 3,
      _b = Just $ Record2 {
        _c = "Riker",
        _d = [4,5,6]
      }
    },
    Record1 {
      _a = 4,
      _b = Just $ Record2 {
        _c = "Data",
        _d = [7,8,9]
      }
    }
  ]

ids   = traverse.a
names = traverse.b._Just.c
nums  = traverse.b._Just.d

ex1 = set ids 0 records
ex2 = view nums records
ex3 = over ids (+1) records
ex4 = toListOf names records

Polymorphic Update

--        +---- a  : Type of input structure
--        | +-- a' : Type of output structure
--        | |
type Lens a a' b b' = forall f. Functor f => (b -> f b') -> (a -> f a')
--             | |
--             | +-- b  : Type of input target
--             +---- b' : Type of output target
{-# LANGUAGE RankNTypes #-}
{-# LANGUAGE DeriveFunctor #-}
{-# LANGUAGE NoMonomorphismRestriction #-}

import Data.Functor

type Lens a a' b b' = forall f. Functor f => (b -> f b') -> (a -> f a')
type Lens' a b = Lens a a b b

newtype Const x a  = Const { runConst :: x } deriving Functor
newtype Identity a = Identity { runIdentity :: a } deriving Functor

lens :: (a -> b) -> (a -> b' -> a') -> Lens a a' b b'
lens getter setter f a = fmap (setter a) (f (getter a))

set :: Lens a a' b b' -> b' -> a -> a'
set l b = runIdentity . l (const (Identity b))

get :: Lens a a' b b' -> a -> b
get l = runConst . l Const

over :: Lens a a' b b' -> (b -> b') -> a -> a'
over l f a = set l (f (get l a)) a

compose :: Lens a a' b b' -> Lens b b' c c' -> Lens a a' c c'
compose l s = l . s

id' :: Lens a a a a
id' = id

infixl 1 &
infixr 4 .~
infixr 4 %~
infixr 8 ^.

(^.) = flip get
(.~) = set
(%~) = over

(&) :: a -> (a -> b) -> b
(&) = flip ($)

(+~), (-~), (*~) :: Num b => Lens a a b b -> b -> a -> a
f +~ b = f %~ (+b)
f -~ b = f %~ (subtract b)
f *~ b = f %~ (*b)

-- Monomorphic Update
data Foo = Foo { _a :: Int } deriving Show
data Bar = Bar { _b :: Foo } deriving Show

a :: Lens' Foo Int
a = lens getter setter
  where
    getter :: Foo -> Int
    getter = _a

    setter :: Foo -> Int -> Foo
    setter = (\f new -> f { _a = new })

b :: Lens' Bar Foo
b = lens getter setter
  where
    getter :: Bar -> Foo
    getter = _b

    setter :: Bar -> Foo -> Bar
    setter = (\f new -> f { _b = new })

-- Polymorphic Update
data Pair a b = Pair a b deriving Show

pair :: Pair Int Char
pair = Pair 1 'b'

_1 :: Lens (Pair a b) (Pair a' b) a a'
_1 f (Pair a b) = (\x -> Pair x b) <$> f a

_2 :: Lens (Pair a b) (Pair a b') b b'
_2 f (Pair a b) = (\x -> Pair a x) <$> f b

ex1 = pair ^. _1
ex2 = pair ^. _2
ex3 = pair & _1 .~ "a"
ex4 = pair & (_1  %~ (+1))
           . (_2  .~ 1)

Prisms

type Prism a a' b b' = forall f. Applicative f => (b -> f b') -> (a -> f a')

Just as lenses allow us to manipulate product types, Prisms allow us to manipulate sum types allowing us to traverse and apply functions over branches of a sum type selectively.

The two libraries lens and lens-family disagree on how these structures are defined and which constraints they carry but both are defined in terms of at least an Applicative instance. A prism instance in the lens library is constructed via prism for polymorphic lens ( those which may change a resulting type parameter) and prism' for those which are strictly monomorphic. Just as with the Lens instance makePrisms can be used to abstract away this boilerplate via Template Haskell.

import Control.Lens

data Value = I Int
           | D Double
           deriving Show

_I :: Prism' Value Int
_I = prism remit review
  where
    remit :: Int -> Value
    remit a = I a

    review :: Value -> Either Value Int
    review (I a) = Right a
    review a     = Left a

_D :: Prism' Value Double
_D = prism remit review
  where
    remit :: Double -> Value
    remit a = D a

    review :: Value -> Either Value Double
    review (D a) = Right a
    review a     = Left a


test1 :: Maybe Int
test1 = (I 42) ^? _I

test2 :: Value
test2 = 42 ^. re _I

test3 :: Value
test3 = over _I succ (I 2)

test4 :: Value
test4 = over _I succ (D 2.71)
_just :: Prism (Maybe a) (Maybe b) a b
_just = prism Just $ maybe (Left Nothing) Right

_nothing :: Prism' (Maybe a) ()
_nothing = prism' (const Nothing) $ maybe (Just ()) (const Nothing)

_left :: Prism (Either a c) (Either b c) a b
_left = prism Left $ either Right (Left . Right)

_right :: Prism (Either c a) (Either c b) a b
_right = prism Right $ either (Left . Left) Right

In keeping with the past examples, I'll try to derive Prisms from first principles although this is no easy task as they typically are built on top of machinery in other libraries. This a (very) rough approximation of how one might do it using lens-family-core types.

{-# LANGUAGE RankNTypes #-}
{-# LANGUAGE DeriveFunctor #-}
{-# LANGUAGE NoMonomorphismRestriction #-}

import Data.Functor
import Data.Monoid

import Control.Applicative
import Data.Traversable

newtype Getting c a = Getting { unGetting :: c }
newtype Setting a = Setting { unSetting :: a }

type LensLike f s t a b = (a -> f b) -> s -> f t

type Lens a a' b b' = forall f. Functor f => LensLike f a a' b b'
type Lens' a b = Lens a a b b

type Prism a a' b b' = forall f. Applicative f => (b -> f b') -> (a -> f a')
type Prism' a b = Prism a a b b

type Setter a a' b b' = LensLike Setting a a' b b'
type Setter' a b = Setter a a b b

type Getter a c = forall r d b. (c -> Getting r d) -> a -> Getting r b

type FoldLike r a a' b b' = LensLike (Getting r) a a' b b'

instance Functor (Getting c) where
  fmap _ (Getting c) = Getting c

instance Monoid c => Applicative (Getting c) where
  pure _ = Getting mempty
  Getting a <*> Getting b = Getting (a `mappend` b)

class Functor f => Phantom f where
  coerce :: f a -> f b

instance Phantom (Getting c) where
  coerce (Getting c) = Getting c

instance Functor Setting where
  fmap f (Setting a) = Setting (f a)

instance Applicative Setting where
  pure = Setting
  Setting f <*> Setting a = Setting (f a)


lens :: (a -> b) -> (a -> b' -> a') -> Lens a a' b b'
lens getter setter f a = fmap (setter a) (f (getter a))

(.~) :: Setter a a' b b' -> b' -> a -> a'
l .~ b = l %~ const b

view :: FoldLike b a a' b b' -> a -> b
view l = unGetting . l Getting

over :: Setter a a' b b' -> (b -> b') -> a -> a'
over l = (l %~)

set :: Setter a a' b b' -> b' -> a -> a'
set = (.~)

(%~) :: Setter a a' b b' -> (b -> b') -> a -> a'
l %~ f = unSetting . l (Setting . f)

compose :: Lens a a' b b' -> Lens b b' c c' -> Lens a a' c c'
compose l s = l . s

id' :: Lens' a a
id' = id

infixl 1 &
infixr 4 .~
infixr 4 %~
infixr 8 ^.

(^.) :: a -> FoldLike b a a' b b' -> b
(^.) = flip view

(&) :: a -> (a -> b) -> b
(&) = flip ($)

(+~), (-~), (*~) :: Num b => Setter' a b -> b -> a -> a
f +~ b = f %~ (+b)
f -~ b = f %~ (subtract b)
f *~ b = f %~ (*b)


infixr 8 ^?
infixr 8 ^..

views :: FoldLike r a a' b b' -> (b -> r) -> a -> r
views l f = unGetting . l (Getting . f)

(^?) :: a -> FoldLike (First b) a a' b b' -> Maybe b
x ^? l = firstOf l x

(^..) :: a -> FoldLike [b] a a' b b' -> [b]
x ^.. l = toListOf l x

toListOf :: FoldLike [b] a a' b b' -> a -> [b]
toListOf l = views l (:[])

firstOf :: FoldLike (First b) a a' b b' -> a -> Maybe b
firstOf l = getFirst . views l (First . Just)

prism :: (b -> t) -> (s -> Either t a) -> Prism s t a b
prism rm rv f a =
  case rv a of
    Right x -> fmap rm (f x)
    Left x  -> pure x

prism' :: (b -> s) -> (s -> Maybe a) -> Prism s s a b
prism' rm rv f a =
  case rv a of
    Just x  -> fmap rm (f x)
    Nothing -> pure a

_just :: Prism (Maybe a) (Maybe b) a b
_just = prism Just $ maybe (Left Nothing) Right

_nothing :: Prism' (Maybe a) ()
_nothing = prism' (const Nothing) $ maybe (Just ()) (const Nothing)

_right :: Prism (Either c a) (Either c b) a b
_right = prism Right $ either (Left . Left) Right

_left :: Prism (Either a c) (Either b c) a b
_left = prism Left $ either Right (Left . Right)

to :: (s -> a) -> Getter s a
to p f = coerce . f . p



pair :: (Int, Char)
pair = (1, 'b')

_1 :: Lens (a, b) (a', b) a a'
_1 f (a, b) = (\x -> (x, b)) <$> f a

_2 :: Lens (a, b) (a, b') b b'
_2 f (a, b) = (\x -> (a, x)) <$> f b

both :: Prism (a, a) (b, b) a b
both f (a, b) = (,) <$> f a <*> f b

ex1 = pair ^. _1
ex2 = pair ^. _2
ex3 = pair & _1 .~ "a"
ex4 = pair & (_1  %~ (+1))
           . (_2  .~ 1)

ex5 = (1, 2) & both .~ 1
ex6 = Just 3 & _just +~ 1
ex7 = (Left 3) ^? _left
ex8 = over traverse (+1) [1..25]

data Value
  = I Int
  | D Double
  deriving Show

_I :: Prism' Value Int
_I = prism remit review
  where
    remit :: Int -> Value
    remit a = I a

    review :: Value -> Either Value Int
    review (I a) = Right a
    review a     = Left a

ex9 :: Maybe Int
ex9 = (I 42) ^? _I

ex10 :: Value
ex10 = over _I succ (I 2)

ex11 :: Value
ex11 = over _I succ (D 2.71)

State and Zoom

Within the context of the state monad there are a particularly useful set of lens patterns.

  • use - View a target from the state of the State monad.
  • assign - Replace the target within a State monad.
  • zoom - Modify a target of the state with a function and perform it on the global state of the State monad.

So for example if we wanted to write a little physics simulation of the random motion of particles in a box. We can use the zoom function to modify the state of our particles in each step of the simulation.

{-# LANGUAGE TemplateHaskell #-}

import Control.Lens
import Control.Monad.State
import System.Random

data Vector = Vector
    { _x :: Double
    , _y :: Double
    } deriving (Show)

data Box = Box
    { _particles :: [Particle]
    } deriving (Show)

data Particle = Particle
    { _pos :: Vector
    , _vel :: Vector
    } deriving (Show)

makeLenses ''Box
makeLenses ''Particle
makeLenses ''Vector

step :: StateT Box IO ()
step = zoom (particles.traverse) $ do
    dx <- use (vel.x)
    dy <- use (vel.y)
    pos.x += dx
    pos.y += dy

particle :: IO Particle
particle = do
  vx <- randomIO
  vy <- randomIO
  return $ Particle (Vector 0 0) (Vector vx vy)

simulate :: IO Box
simulate = do
  particles <- replicateM 5 particle
  let simulation = replicateM 5 step
  let box = Box particles
  execStateT simulation box

main :: IO ()
main = simulate >>= print

This results in a final state like the following.

Box
  { _particles =
      [ Particle
          { _pos = 
              Vector { _x = 3.268546939011934 , _y = 4.356638656040016 }
          , _vel =
              Vector { _x = 0.6537093878023869 , _y = 0.8713277312080032 }
          }
      , Particle
          { _pos =
              Vector { _x = 0.5492296641559635 , _y = 0.27244422070641594 }
          , _vel =
              Vector { _x = 0.1098459328311927 , _y = 5.448884414128319e-2 }
          }
      , Particle
          { _pos =
              Vector { _x = 3.961168796078436 , _y = 4.9317543172941765 }
          , _vel =
              Vector { _x = 0.7922337592156872 , _y = 0.9863508634588353 }
          }
      , Particle
          { _pos =
              Vector { _x = 4.821390854065674 , _y = 1.6601909953629823 }
          , _vel =
              Vector { _x = 0.9642781708131349 , _y = 0.33203819907259646 }
          }
      , Particle
          { _pos =
              Vector { _x = 2.6468253761062943 , _y = 2.161403445396069 }
          , _vel =
              Vector { _x = 0.5293650752212589 , _y = 0.4322806890792138 }
          }
      ]
  }

Lens + Aeson

One of the best showcases for lens is writing transformations over arbitrary JSON structures. For example consider some sample data related to Kiva loans.

{
   "loans":[
      {
         "id":2930,
         "terms":{
            "local_payments":[
               {
                  "due_date":"2007-02-08T08:00:00Z",
                  "amount":13.75
               },
               {
                  "due_date":"2007-03-08T08:00:00Z",
                  "amount":93.75
               },
               {
                  "due_date":"2007-04-08T07:00:00Z",
                  "amount":43.75
               },
               {
                  "due_date":"2007-05-08T07:00:00Z",
                  "amount":63.75
               },
               {
                  "due_date":"2007-06-08T07:00:00Z",
                  "amount":93.75
               },
               {
                  "due_date":"2007-07-08T05:00:00Z",
                  "amount": null
               },
               {
                  "due_date":"2007-07-08T07:00:00Z",
                  "amount":93.75
               },
               {
                  "due_date":"2007-08-08T07:00:00Z",
                  "amount":93.75
               },
               {
                  "due_date":"2007-09-08T07:00:00Z",
                  "amount":93.75
               }
            ]
          }
      }
   ]
}

Then using Data.Aeson.Lens we can traverse the structure using our lens combinators.

{-# LANGUAGE OverloadedStrings #-}

import Control.Lens

import Data.Aeson.Lens
import Data.Aeson (decode, Value)
import Data.ByteString.Lazy as BL

main :: IO ()
main = do
  contents <- BL.readFile "kiva.json"
  let Just json = decode contents :: Maybe Value

  let vals :: [Double]
      vals = json ^.. key "loans"
                    . values
                    . key "terms"
                    . key "local_payments"
                    . values
                    . key "amount"
                    . _Double
  print vals
[13.75,93.75,43.75,63.75,93.75,93.75,93.75,93.75]

Categories

Alas we come to the topic of category theory. Some might say all discussion of Haskell eventually leads here at one point or another...

Nevertheless the overall importance of category theory in the context of Haskell has been somewhat overstated and unfortunately mystified to some extent. The reality is that amount of category theory which is directly applicable to Haskell roughly amounts to a subset of the first chapter of any undergraduate text.

Categories

The most basic structure is a category which is an algebraic structure of objects (Obj) and morphisms (Hom) with the structure that morphisms compose associatively and the existence of a identity morphism for each object.

With kind polymorphism enabled we can write down the general category parameterized by a type variable "c" for category, and the instance Hask the category of Haskell types with functions between types as morphisms.

{-# LANGUAGE PolyKinds #-}
{-# LANGUAGE TypeOperators #-}
{-# LANGUAGE TypeSynonymInstances #-}

import Prelude hiding ((.), id)

-- Morphisms
type (a ~> b) c = c a b

class Category (c :: k -> k -> *) where
  id :: (a ~> a) c
  (.) :: (y ~> z) c -> (x ~> y) c -> (x ~> z) c

type Hask = (->)

instance Category Hask where
  id x = x
  (f . g) x = f (g x)

Isomorphisms

Two objects of a category are said to be isomorphic if there exists a morphism with 2-sided inverse.

f  :: a -> b
f' :: b -> a

Such that:

f . f' = id
f'. f  = id

For example the types Either () a and Maybe a are isomorphic.

{-# LANGUAGE ExplicitForAll #-}

data Iso a b = Iso { to :: a -> b, from :: b -> a }

f :: forall a. Maybe a -> Either () a
f (Just a) = Right a
f Nothing  = Left ()

f' :: forall a. Either () a -> Maybe a
f' (Left _)  = Nothing
f' (Right a) = Just a

iso :: Iso (Maybe a) (Either () a)
iso = Iso f f'

data V = V deriving Eq

ex1 = f  (f' (Right V)) == Right V
ex2 = f' (f  (Just V))  == Just V
data Iso a b = Iso { to :: a -> b, from :: b -> a }

instance Category Iso where
  id = Iso id id
  (Iso f f') . (Iso g g') = Iso (f . g) (g' . f')

Duality

One of the central ideas is the notion of duality, that reversing some internal structure yields a new structure with a "mirror" set of theorems. The dual of a category reverse the direction of the morphisms forming the category COp.

import Control.Category
import Prelude hiding ((.), id)

newtype Op a b = Op (b -> a)

instance Category Op where
  id = Op id
  (Op f) . (Op g) = Op (g . f)

See:

Functors

Functors are mappings between the objects and morphisms of categories that preserve identities and composition.

{-# LANGUAGE MultiParamTypeClasses #-}
{-# LANGUAGE TypeSynonymInstances #-}

import Prelude hiding (Functor, fmap, id)

class (Category c, Category d) => Functor c d t where
  fmap :: c a b -> d (t a) (t b)

type Hask = (->)

instance Category Hask where
  id x = x
  (f . g) x = f (g x)

instance Functor Hask Hask [] where
  fmap f [] = []
  fmap f (x:xs) = f x : (fmap f xs)
fmap idid
fmap (a . b) ≡ (fmap a) . (fmap b)

Natural Transformations

Natural transformations are mappings between functors that are invariant under interchange of morphism composition order.

type Nat f g = forall a. f a -> g a

Such that for a natural transformation h we have:

fmap f . h ≡ h . fmap f 

The simplest example is between (f = List) and (g = Maybe) types.

headMay :: forall a. [a] -> Maybe a
headMay []     = Nothing
headMay (x:xs) = Just x

Regardless of how we chase safeHead, we end up with the same result.

fmap f (headMay xs) ≡ headMay (fmap f xs)
fmap f (headMay [])
= fmap f Nothing
= Nothing

headMay (fmap f [])
= headMay []
= Nothing
fmap f (headMay (x:xs))
= fmap f (Just x)
= Just (f x)

headMay (fmap f (x:xs))
= headMay [f x]
= Just (f x)

Or consider the Functor (->).

f :: (Functor t)
  => (->) a b
  -> (->) (t a) (t b)
f = fmap

g :: (b -> c)
  -> (->) a b
  -> (->) a c
g = (.)

c :: (Functor t)
  => (b -> c)
  -> (->) (t a) (t b)
  -> (->) (t a) (t c)
c = f . g
f . g x = c x . g

A lot of the expressive power of Haskell types comes from the interesting fact that with a few caveats, Haskell polymorphic functions are natural transformations.

See: You Could Have Defined Natural Transformations

Yoneda Lemma

The Yoneda lemma is an elementary, but deep result in Category theory. The Yoneda lemma states that for any functor F, the types F a and ∀ b. (a -> b) -> F b are isomorphic.

{-# LANGUAGE RankNTypes #-}

embed :: Functor f => f a -> (forall b . (a -> b) -> f b)
embed x f = fmap f x

unembed :: Functor f => (forall b . (a -> b) -> f b) -> f a
unembed f = f id

So that we have:

embed . unembed ≡ id
unembed . embed ≡ id

The most broad hand-wavy statement of the theorem is that an object in a category can be represented by the set of morphisms into it, and that the information about these morphisms alone sufficiently determines all properties of the object itself.

In terms of Haskell types, given a fixed type a and a functor f, if we have some a higher order polymorphic function g that when given a function of type a -> b yields f b then the behavior g is entirely determined by a -> b and the behavior of g can written purely in terms of f a.

See:

Kleisli Category

Kleisli composition (i.e. Kleisli Fish) is defined to be:

(>=>) :: Monad m => (a -> m b) -> (b -> m c) -> a -> m c
f >=> g ≡ \x -> f x >>= g 

(<=<) :: Monad m => (b -> m c) -> (a -> m b) -> a -> m c
(<=<) = flip (>=>)

The monad laws stated in terms of the Kleisli category of a monad m are stated much more symmetrically as one associativity law and two identity laws.

(f >=> g) >=> h ≡ f >=> (g >=> h)
return >=> f ≡ f
f >=> return ≡  f

Stated simply that the monad laws above are just the category laws in the Kleisli category.

{-# LANGUAGE TypeOperators #-}
{-# LANGUAGE ExplicitForAll #-}

import Control.Monad
import Control.Category
import Prelude hiding ((.))

-- Kleisli category
newtype Kleisli m a b = K (a -> m b)

-- Kleisli morphisms ( a -> m b )
type (a :~> b) m = Kleisli m a b

instance Monad m => Category (Kleisli m) where
  id            = K return
  (K f) . (K g) = K (f <=< g)


just :: (a :~> a) Maybe
just = K Just

left :: forall a b. (a :~> b) Maybe -> (a :~> b) Maybe
left f = just . f

right :: forall a b. (a :~> b) Maybe -> (a :~> b) Maybe
right f = f . just

For example, Just is just an identity morphism in the Kleisli category of the Maybe monad.

Just >=> f ≡ f
f >=> Just ≡ f

Mathematics

Just as in Haskell we try to unify the common ideas from distinct structures, we can ask a simple question like what the fundamental notion of a group is for different mathematical categories:

Category Description Group
Set The category of sets with objects as Abelian group
sets and morphisms are functions
between them.
Man The category of manifolds with Lie group
objects as manifolds and morphisms
as differentiable functions between
manifolds.
Top The category of topological spaces Topological group
with objects as topological spaces
as and continuous functions between
spaces.
Grp The category of Abelian groups, Category objects
with groups as objects and group
homomorphism between groups.

Some deep results in algebraic topology about the homology groups of topological spaces turn out stated very concisely as the relationships between functors and natural isomorphisms of these four categories!

Which segways into some of the most exciting work in computer science at the moment, Homotopy Type Theory which I won't try to describe! :)

Resources