"Monads for programmers" for mathematicians
Haskell is probably my favorite programming language (for example, it’s what this site runs on.) The language is somewhat infamous to take lots of inspiration from category theory, featuring objects such as functors, monoids, and most famously, monads.
There is a large number of “monads for programmers” articles which explain the role of monads in Haskell. Most of them start with “a monad is just a monoid in the category of endofunctors” as some kind of gibberish joke, and then follow it up with a more down-to earth comforting statement like “monads represent different types of computation”. If you have a category theoretic background, the first sentence won’t be gibberish to you, but the second one probably will be, and most of these tutorials actually never end up addressing how monads in Haskell actually relate to monads in category theory.
So for the rest of the article, let’s assume that the first sentence makes perfect sense to you (and you know things such as that monads arise from composing pairs of adjoint functors), but you have never seen Haskell. We’ll try to answer the question “why on earth would programmers need to learn about monads?”. To answer that question, we’ll first need to answer the same question with the word “monads” replaced by “categories” and by “functors”, in the form of a quick overview over Haskell.
This is in no way an actual Haskell tutorial, and I’m skipping over as much actual Haskell syntax as possible. If afterwards you want to get an actual impression of Haskell as a programming language, I had lots of fun reading Learn You a Haskell for Great Good many years ago.
Haskell and categories
Like most programming languages, Haskell has a notion of types, for example Integer
, which is the type of integer numbers, or String
, which is the type of strings. In Haskell, we write n :: Integer
to express that “n
is a value of type Integer
”. To some degree, we can think of types as sets, and ::
akin to \(\in\). (There are some differences, for example there is a distinction between values and types here, that one doesn’t have in set theory, so this is just a helpful analogy.)
Unlike most programming languages, Haskell does not have a distinction between values and functions. Functions aren’t their own separate thing, instead they are just a special type of value. For types a
and b
, there is a function type a -> b
, and a value of that type, f :: a -> b
, is a function that takes an input of type a
and computes a value of type b
. The fact that functions are just values makes it so one can have functions that return functions, or take functions as arguments, for example composition of functions, written as g . f
in analogy to \(g \circ f\) in math, does not have to be a special syntactical construct, it is literally a function that takes two functions as input and spits out a third.
“Definition.” Write \(\mathrm{Hask}\) for the category whose objects are Haskell types and where elements of \(\mathrm{Hom}(a,b)\) are given by the set of functions f :: a -> b
. Composition is given by function composition.
There is actually an issue here: What exactly do we mean by “the set of functions f :: a -> b
”? If we take these to be just “the set of all syntactically valid expressions of this type”, that’s not a category, for example id . id
and id
are different expressions syntactically. So for this to really work, one has to pass to equivalence classes where two functions are regarded equivalent if they have the same behavior, which leads to more problems discussed here. It seems likely that it is possible to formalize the semantics of some idealized subset of Haskell in a way to make this category \(\mathrm{Hask}\) precise, but we will not concern ourselves with this here and just treat it as a helpful analogy.
Haskell has tuples, in particular for every pair of types a
and b
there is a type (a,b)
whose values are pairs of values of type a
and b
. This means that \(\mathrm{Hask}\) has products. The terminal object / cartesian unit is the one-element type ()
, which has a unique value () :: ()
(the notation suggests an empty tuple). Functions f :: (a,b) -> c
can be turned into functions f' :: a -> (b -> c)
, which means that \(\mathrm{Hask}\) is cartesian closed with internal hom given by the function types a -> b
. This also means Haskell programmers have no need for functions in more than one variable, for example multiplication of integers simply comes as a function (*) :: Integer -> (Integer -> Integer)
. (They call the product-hom adjunction “Currying” in honor of logician Haskell Curry.) This makes functions that return functions so common that the precedence conventions for ->
are so that a -> b -> c
means a -> (b -> c)
, so that the parentheses can be left out. \(\mathrm{Hask}\) also has coproducts in the form of “disjoint union” types, which are extremly useful in structuring data (think for example how to represent the response of some web server, which could either be some content or an error code), but we won’t care about their syntax here.
Besides function types, product types and coproduct types, there are also custom constructions that turn a type into a new one. For example, for every type a
, there is a type [a]
whose values are lists with entries of type a
. Similarly, for every a
there is a type Maybe a
whose values are either of type a
, or a special element Nothing :: Maybe a
(this is often used as return type of functions that might fail to find an answer, for example there could be a function [a] -> Maybe a
that returns the first element of a list, and Nothing
if the list is empty). Often, these constructions behave like functors \(\mathrm{Hask}\to\mathrm{Hask}\). Indeed, given a function f :: a -> b
, there should be a function [a] -> [b]
mapping f
over each element of the list, or Maybe a -> Maybe b
, mapping f
over elements coming from a
and taking Nothing
to Nothing
. Haskell provides a mechanism to convince the compiler that some type construction is indeed functorial, and a function fmap :: (a -> b) -> f a -> f b
for all functors f
.
Why monads though?
If you’re coming from a category-theoretical background, all of the above probably more or less sounded like the category of sets with some funny notation. But functions in programming do usually not behave like functions between sets. For example, Python provides a function print(str)
which takes one string as argument, prints it, and returns nothing, and a function input()
, which takes no argument, waits for the user to enter a line and returns it. One might think that Haskell could have similar functions of type String -> ()
and () -> String
, but there is a problem: If we want to keep our cartesian closed category \(\mathrm{Hask}\), ()
is the terminal object, so there is a unique morphism String -> ()
(the constant one), and the data of a morphism () -> String
is the same as a value of type String
(and is given by the constant function returning that string). So functions in the Python (or most other programming languages) sense, which don’t just compute an output value from input values deterministically, cannot be modeled as morphisms in \(\mathrm{Hask}\). (The programming people call the functions that do behave like functions in mathematics “pure” or “side-effect free”, where “side-effect” not only includes input-output interactions but any sort of nondeterministic behaviour.)
Of course, a programming language that can’t print something (or get text input from the user, or read a file, or query a webserver,…) would be pretty useless. So we have to model functions with side-effects differently. While we couldn’t include these among the morphisms in \(\mathrm{Hask}\) since we wanted that one to be cartesian closed, nothing prevents us from having another category in the picture. Let us write \(\mathrm{Hask}^{\mathrm{impure}}\) for a category with objects still given by Haskell types, but morphisms from a
to b
given by “impure” functions with argument of type a
and return value of type b
. How should that category behave with respect to \(\mathrm{Hask}\)?
First of all, there should be a functor \(\mathrm{Hask}\to \mathrm{Hask}^\mathrm{impure}\), which simply regards pure functions as impure. Ideally we would not have to work in two different categories after all, and so we postulate one more thing: We want this functor to have a right adjoint \(R: \mathrm{Hask}^\mathrm{impure}\to \mathrm{Hask}\). This means that a morphism from a
to b
in \(\mathrm{Hask}^\mathrm{impure}\) is exactly a function of type a -> R b
(in \(\mathrm{Hask}\)). This is a pure function that takes in an input of type a
, but then returns not an actual value of type b
, but instead something of type R b
. We may therefore think of values of that type as some abstract representative of an impure action to take which leads to a value of type b
. Due to the fact that the main source of “impurity” are interactions with input and output, this R
is called IO
.
Since IO
arises as composite of adjoint functors, it is a monad on \(\mathrm{Hask}\), and the above discussion exactly identifies \(\mathrm{Hask}^\mathrm{impure}\) with its Kleisli category, allowing us to model “functions with side effects” as functions of type a -> IO b
. For example, Python’s print(str)
and input()
exist in Haskell as putStrLn :: String -> IO ()
and getLine :: IO String
(you’d maybe expect () -> IO String
as the type of the latter, but since \(\mathrm{Hask}\) is cartesian closed we have an isomorphism between the types () -> a
and a
, so one might as well drop the ()
. Think of this as the \(0\)-ary version of currying.) Of course, there cannot be a function of type IO String -> String
or similar to get an actual string out of getLine
(otherwise, \(\mathrm{Hask}\) would again have all the aforementioned problems with impure functions), but using the composition in the Kleisli category (and functoriality of IO
), elementary impure functions like putStrLn
and getLine
can be combined to more complex ones, ultimately yielding a value main :: IO ()
which represents the entire program as one big action to take. For example, getLine
, some function String -> String
, and putStrLn
could be composed to a program that reads input, changes it somehow, and prints the result. So in a way we sidestepped actually performing impure computations by instead doing pure computations on values which are an abstract representation of impure computations.
The required composition of morphisms a -> IO b
in the Kleisli category can be specified in different ways. For category theorists, the most natural approach would be in terms of the monad structure on IO
, meaning in terms of natural maps IO (IO a) -> IO a
and a -> IO a
. Haskell instead opts to build everything out of a function (>>=) :: IO a -> (a -> IO b) -> IO b
(and return :: a -> IO a
). These are of course equivalent, and one also finds a function join :: IO (IO a) -> IO a
, but the weird-looking >>=
operator (pronounced “bind”) is considered more fundamental, even forming part of the official logo of Haskell.
…but why monads??
The above suggests the following informal picture on IO
, which is close to what can be found in monad tutorials aimed at programmers. We can think of a function a -> IO b
as a function a -> b
which is in a sense “tainted by side effects”, and of a value of type IO a
as a value of type a
which is similarly tainted. The “bind” function (>>=) :: IO a -> (a -> IO b) -> IO b
gives us a way to apply a side-effect tainted function to a side-effect tainted value in a way that combines the side effects in such a way that the resulting thing is still only tainted by one level of side effects. If we had any other monad m
on \(\mathrm{Hask}\), we would similarly get a function (>>=) :: m a -> (a -> m b) -> m b
, and could think of values of type m a
and functions of type a -> m b
analogously as representing values and functions of types a
and a -> b
with some sort of additional context.
A great example is the functor Maybe
discussed above, which adds a distinguished value Nothing :: Maybe a
. There is a monad structure here, with Maybe (Maybe a) -> Maybe a
collapsing the two added points into one. (The topologists will recognize this as the monad of the adjunction between pointed and unpointed objects.) In the corresponding “bind” function Maybe a -> (a -> Maybe b) -> Maybe b
, if either the input value is Nothing
or the function returns Nothing
, the output is Nothing
as well. This can therefore be used to elegantly compose calculations which might fail somewhere along the way. (For comparison, the Kleisli category of the corresponding monad on the category of sets is equivalent to sets with partially defined functions as morphisms.)
Another example is the “list” functor: We have a function concat :: [[a]] -> [a]
which flattens a list of lists, and this makes the list functor into a monad. Here there is also a way to think of values of type [a]
as “values of type a
with some additional context”, but the context here is a kind of multi-valuedness, and the “bind” function (>>=) :: [a] -> (a -> [b]) -> [b]
allows us to apply a multi-valued function to multiple values, collecting all results in a single list.
So while the most important monad in Haskell is the built-in IO
monad (which is the only reason that Haskell programs can actually do anything.), the general slogan is that monads on the category \(\mathrm{Hask}\) in general provide an abstraction of different modes of computation (effectively, by working in Kleisli categories of different monads on \(\mathrm{Hask}\).) This makes it so a lot of utility functions can be used in a wide variety of contexts.
Some more examples
If you are among the people who this article is mainly aimed at, you probably are more used to monads in the context of the Barr-Beck theorem, and maybe even had to look up the Kleisli category earlier. Somehow, algebras over monads do not really appear in Haskell (although technically Monoids are algebras over the “list” monad). But: some of the most useful examples in Haskell still arise from familiar adjunctions!
For any object \(s\) of a cartesian closed category \(\mathcal{C}\), we have an adjunction of functors \(\mathcal{C}\to \mathcal{C}\), with left adjoint given by \(-\times s\), and right adjoint by \(\underline{\mathrm{Hom}}(s,-)\). This gives a monad structure on \(\underline{\mathrm{Hom}}(s,-\times s)\). In Haskell this leads to a monad structure on s -> (a,s)
, called “state monad” and written State s a
. We can think of a value of type State s a
as representing a computation of a value of type a
which depends on a state of type s
and which updates the state. Composition in the Kleisli category will then effectively implicitly update the state along the way.
For any monoid object \(w\) in a category \(\mathcal{C}\) with finite products, we have a category \(\mathrm{Mod}_w(\mathcal{C})\) of objects in \(\mathcal{C}\) with \(w\)-action. The corresponding free-forget adjunction gives a monad structure on \(-\times w : \mathcal{C}\to \mathcal{C}\). This can of course also be described directly (for example since the functor \(\mathcal{C}\to \mathrm{Fun}(\mathcal{C},\mathcal{C})\) taking \(w\) to \(-\times w\) is monoidal). In Haskell, this manifests as a monad structure on \((a,w)\) for every monoid \(w\), called the “writer monad” and written Writer a w
, because we can think of a value of type Writer a w
as not just providing a result of type a
but also some kind of additional output of type w
, and composition in the Kleisli category combines all those outputs into one value of type w
using the monoid structure. This can for example be used for logging (with w = String
).
For any object \(r\) of a cartesian closed category \(\mathcal{C}\), the functor from \(\mathcal{C}\) to the slice \(\mathcal{C}_{/r}\) given by \(-\times r\) has both adjoints. Its right adjoint takes an object \(x\to r\) of the slice category to its sections (the fiber of \(\underline{\mathrm{Hom}}(r,x) \to \underline{\mathrm{Hom}}(r,r)\) over the identity). The associated monad comes out as \(\underline{\mathrm{Hom}}(r,-): \mathcal{C}\to \mathcal{C}\). Under the isomorphism \(\underline{\mathrm{Hom}}(r,\underline{\mathrm{Hom}}(r,-)) \cong \underline{\mathrm{Hom}}(r\times r,-)\), the monadic structure map is induced by the diagonal \(r\to r\times r\), which can be thought as the unique comonoid structure every object carries in the cartesian monoidal structure. So this is in some sense dual to the “writer monad” above. In Haskell, it corresponds to a Monad structure on r -> a
, called “reader monad” because the associated Kleisli category essentially models computations that can read some global information of type r
(e.g. configuration settings).