Software! Math! Data! The blog of R. Sean Bowman
The blog of R. Sean Bowman

# Monads: History and Context

December 01 2015

I want to talk about monads. These are fairly obscure mathematical objects that have received quite a bit of attention in the programming community lately. This, I think, is mostly because of their use in Haskell to handle "side effects" and other notions of computation inside the type system. It’s grown from there, however, and now you can find monad tutorials and code using monads in languages from Javascript to C++. Today, I want to talk a little about category theory, the history of monads, and the context in which mathematicians and theoretical computer scientists see them.

The world has enough monad tutorials. I’d like to learn more about them as mathematical objects, their history, and different ways they’re used today. I’m writing this down not for anybody else in particular, but for my own learning. Of course, I hope that somebody else gets something from it! My aims, though, are more mundane: learn some things, write them down, and add in my own opinions for what it’s worth. (In matters of taste, as they say, there’s plenty of room for disagreement, so you might disagree with some things I say. If so, let me know! I’d love to get more perspectives on these things.)

## History

Comonads seem to have first appeared in the work of Roger Godement, a well known French mathematician, in a book on sheaf theory in 1958. (In category theory, many ideas have a symmetric counterpart whose name is obtained by adding “co-” to the front. Comonads and monads are very, very closely related, so we consider Godement’s definition as also defining a monad.) Sheaf theory is a modern subject sometimes associated with the work of Grothendieck on algebraic geometry, but its roots go back much farther. Sheaves are objects that capture local data about a manifold yet in so doing allow one to see global properties of the space. In this sense they’re a tool for the familiar mathematical problem of passing from local information about a space to properties of the space as a whole. They also give a very general cohomology theory which is useful in many different branches of math.

The modern definition of sheaves uses category theory in very explicit ways, so it’s no surprise that a construction in such a book would be category theoretic. Indeed, Godement’s “standard construction,” later called a "monad" by Saunders Mac Lane (considered by many to be one of the founders of category theory) is such a construction. Monads are also sometimes called “triples” in earlier literature. (Not a very descriptive name…)

As far as I can tell, the story ends there for a while: people doing sheaf theory and some other very technical things used monads for a while and they lay otherwise dormant until Eugenio Moggi wrote the seminal paper "Notions of computation and monads" in 1991. Now, this paper is very well written and fairly easy to read for a dense, technical paper on foundations/semantics of programming languages/categorical logic/whatever you want to call it, but it’s definitely not for the faint of heart.

Moggi’s aim, as he says in the introduction, is to introduce a “categorical semantics of computation” rich enough to provide meaning for programs in various $\lambda$–calculi. Such a semantics is related closely to denotational semantics (as opposed to operational semantics, for example, which seems to be very popular these days), but different still. Moggi calls this the “logical” approach to semantics and claims that it should be able to deal with new programming language features more easily. (Note that Moggi’s "logical" semantics is much closer in spirit to denotational semantics than, say axiomatic semantics. Today, I believe that this is usually just called "categorical semantics.")

## Monads in computation: Why?

Why do we need to be able to deal with new features, and how do monads help here? The first question is easy: there are many new languages appearing all the time, and it would be nice to have a thorough and deep understanding of their semantics. Languages often add new features to solve particular problems, but the features are rarely specified carefully in a formal way. In order to reason about programs written in these languages (indeed, to understand the languages themselves), we need a framework in which we can deal with features like side effects, IO, exceptions, asynchronous execution, and many others.

How do monads help, then? Moggi’s idea is to take a suitable category $\CC$ together with a functor $T$ from $\CC$ to itself. Objects of $\CC$ play several roles:

• an object $A\in\CC$ is a type, so members of $A$ are values of that type, and
• an object $TA$ for $A\in\CC$ is a computation of type $A$.

Here, computation is left deliberately vague, as $T$ can be one of many endofunctors on $\CC$. Finally, the denotation of a program is an element of $TA$, the result of a computation of type $A$.

This level of generality allows Moggi to prove many interesting things without actually applying the semantics to any concrete “notion of computation!” One of the most important insights, though, in retrospect at least, is his argument that simple constraints and common sense require that $T$ be part of a monad. This is where the “how” part of the answer starts… but it will take a while to see where it goes.

## Concrete uses of the abstract

It seems that it didn’t take people long to see the value in Moggi’s ideas. Philip Wadler and Simon Peyton Jones, among others, began to use these ideas in their work. Ultimately, the Haskell language included the idea (the definition, essentially!) of a monad in its standard library. From there, the rest is history: monads have become very popular as tools for structuring programs, chaining together operations, enabling extensible and (perhaps) reusable code that does cool things. And not just in Haskell, either, but a wide variety of languages.

I’d like to offer a couple of takes on monads. The first is that it always strikes me that monads, and Haskell in general, are a great example of how obscure mathematics becomes mainstream. As early as the 19th century, people were studying what is now called homological algebra, and by the 1940’s the field was extremely well developed, with Eilenberg and Mac Lane introducing the terms (if not the ideas) “category”, “functor”, and “natural transformation.” Eilenberg and Mac Lane were studying homology theories, seemingly far removed from logic, computation, or really anything other than algebro-topological ideas.

But their ideas were very successful, and by the 1960’s, category theory was beginning to see applications in logic and foundations (due largely to F. W. Lawvere) and by the 1970’s had placed itself firmly at the base of logic, semantics of programming languages, type theory, and a host of other subjects related to foundations. At this point, the ideas in category theory were prevalent and well known… by a small circle of mathematicians and computer scientists. There were few books, and even a smart mathematician working in a different field might have had a tough time learning all this category theory business.

As we just saw, Moggi introduced a category theoretic notion to handle a large range of types of computations. And here’s where things start to get interesting, because the work of Moggi and others gave rise to a programming language, Haskell, which uses this notion explicitly to structure its programs. Today, companies in industries as diverse as finance, biotech, home lawn care (no kidding!), and video games use Haskell and its ideas to build software. The UK National air traffic services has software written in Haskell. In all probability, you use, directly or indirectly, systems which are wholly or partly written either in Haskell or in a monadic style.

How’s that for going from completely obscure, highly technical mathematical idea to mainstream influential stuff in a century?

## Application of foundations

The second point I’d like to make is very much related to the first: foundations (logic, model theory, type theory, proof theory, semantics, etc.) have come a long way since 1900. Back then, people were just starting to formalize math, to come up with ideas about what a “proof” really was, to say what is really meant by “some statement is true in some system.” Then, in the 1930s, there were Gödel and Tarski’s theorems, which show that in some senses it is impossible to reason about a formal system from within the system itself. These had profound effects on mathematics. Others (notably philosophers) also used them as arguments for or against this or that. But I think it can be argued that even these very important theorems had very little impact on the day to day lives of people back then (or today, for that matter).

Of course, all these ideas are closely related to notions of computation, and these ideas clearly do have an impact on our day to day lives. But I would say that incompleteness and undefinability per se are mostly important from the point of view of a practitioner. Even then, I don’t know many mathematicians who know (or care) about, say, Gentzen’s proof of the consistency of Peano arithmetic, or Girard’s proof of the consistency of second order arithmetic. These are quite esoteric subjects in and of themselves, even if they do have applications outside foundations.

The situation was different by the 1970s and 1980s. The Hindley-Milner type system, which grew out of independent discoveries by a logician and computer scientist, was implemented in the programming language ML. This language is still in use today (some variants are as popular as ever, in fact!) and, like Haskell, powers some pretty important software. Haskell’s type system itself is based on these ideas.

Later, in the 90s, we see more of this application of foundations to real life. Moggi’s perspective was as a researcher in semantics, a field dominated by highly technical mathematical ideas. His work was not, I don’t think, intended for anybody to take and implement directly. He was interested in extending semantics to handle problems that were previously difficult (by which I mean abstract mathematical problems difficult in the mathematical sense). But in creating that work, he influenced people who built actual systems based on the ideas, and Haskell was born. So the lineage of languages like ML and Haskell goes way back to the beginnings of mathematical logic in addition to other subjects (like the homological algebra described above).

### Historical notes about people

(You can skip this if you only care about category theory and monads, it’s just stuff about famous researchers.)

I’m a sucker for knowing about how famous researchers were connected, and the players in this story are all very interesting people. I’ll confine myself to one small observation: The ideas of Lawvere, a student of Eilenberg, were rebuffed by Mac Lane initially. (You’ll recall that Saunders Mac Lane and Samuel Eilenberg are considered the founders of category theory.) However, Mac Lane was on Lawvere’s thesis committee and they later worked together. This story is difficult to piece together, as several aspects from Mac Lane’s autobiography are contradicted by other sources. In addition to this funny story, Lawvere spent a year at Berkeley as a doctoral candidate, there learning from such famous logicians as Alfred Tarski and Dana Scott.

Tarski, in addition to the undefinability theorem and a bunch of other nifty stuff, proved the famous Banach-Tarski paradox which says that, assuming the axiom of choice, it is possible to divide a three dimensional ball of radius one in to a small number of parts, reassemble those parts by translating and rotating them, and form two three dimensional balls of radius one. (A surprising result indeed! The deal is, the parts the theorem cuts the ball into are exceedingly strange, fractal dust kind of pieces. Such a thing could never happen in our universe, obviously, but this result made mathematicians, who had previously been quite happy to use the seemingly innocent axiom of choice, think a bit harder about whether it was a “common sense” axiom.)

Dana Scott is a famous logician who developed the theory of denotational semantics of programming languages (again, in addition to a lot of other cool stuff). Specifically, he found a family of models for various $\lambda$–calculi, including the untyped $\lambda$–calculus. This involves some very tricky, very interesting mathematics, and had huge ramifications for both logic and semantics of programming languages. Many consider his work the genesis of the field of denotational semantics.

Approx. 2078 words, plus code