I dig the lambda calculus, and there’s a special place in my heart for the untyped calculus because of Scott’s amazing semantic interpretation, denotational semantics, and all that awesome jazz. But I have a confession: I don’t know a lot about reduction strategies for terms in the untyped lambda calculus. I wanted to remedy that (at least to some extent…), and I accidentally stumbled on a cool language to do it in. Here’s what I learned.

You can check out the code for this blog post at github.

## Term rewriting and the Pure language

For reasons completely unrelated to lambda calculus, I had stumbled upon the Pure term rewriting language. This is a nifty little piece of work, I really like it. The idea is very simple: you supply “rules” for rewriting terms, and the system applies those rules to input. If you’re smart and careful about the rules you chose, you can do nifty stuff (like compute anything a computer can compute). The code looks a lot like code from a functional language; Haskell comes to mind. But there are very big, very important differences. It’s simple and crafty and hard to describe all at the same time, so let’s look at some examples.

You can write functions that look like, you know, functions in any old language. Say a function that squares its argument and adds one:

```
square x = x * x + 1;
```

An example invocation would be `square 3`

, and if you ran that in the REPL it
would tell you `10`

. But there’s more going on here: there are some built in
rewriting rules that allow the system to evaluate the expression `3 * 3 + 1`

.
In fact, being a term rewriting language, Pure doesn’t care if the argument to
`square`

is a number; it could be a *symbol*: `square a`

returns `a*a+1`

.
Well that’s interesting! You can see that this might make for the beginnings
of a powerful symbolic math package.

## Lambda calculus

Anyway that’s all stuff for another day. I want to know more about the
untyped lambda calculus, and specifically about reduction
strategies.
So let’s set up a little lambda calculus language in Pure. I’m not going to
go over basic stuff here, so read up on wikipedia if you need a refresher.
Recall that terms in the untyped lambda calculus are either variables,
applications of one term to another, or lambda abstraction over a term. We’ll
represent these in Pure with symbols, `app M N`

, and `lam x E`

, respectively.
The term \(\lambda f.\lambda x. f (f x)\) is written in this notation as

```
lam f (lam x (app f (app f x)))
```

Much of the reduction code is taken from these lecture
notes.
The two main strategies we tend to be familiar with from programming languages
are *call by value* and *call by name*. These are sometimes associated with
*strict* and *lazy* languages, respectively, but I’m not sure there’s exactly
a bijective correspondence.

### Call by name

Anyhow, reduction of terms for the call by name version looks like this:

```
reduce_cbn (lam x t) = lam x t;
reduce_cbn (app e1 e2) = case reduce_cbn e1 of
lam x t = reduce_cbn (subst e2 x t);
e1p = app e1p e2
end;
reduce_cbn x = x;
```

There are three rules here; the first says that we don’t do anything to lambda
abstractions. That makes sense; in a programming language, when we define
a function, nothing really happens until we *call* the function.

Calling functions is the subject of the second rule: when we reduce an
application, we first reduce the head (the thing we’ll be applying to its
argument). If that’s a lambda abstraction, then awesome: we can apply the
function to its argument, reduce the result, and that’s that. The application
is actually performed by the function `subst`

, which takes a term, a variable,
and a function body, returning the body with all instances of the variable
replaced with the term.

The third rule just says that there’s nothing to do to reduce a variable.

Note that in the call by name strategy, the argument in the application is
*not* reduced beforehand. This is the “lazy” part – terms are only evaluated
if necessary.

### Call by value

The call by value reduction looks a bit different:

```
reduce_cbv (lam x t) = lam x t;
reduce_cbv (app e1 e2) = case reduce_cbv e1 of
lam x t = reduce_cbv (subst (reduce_cbv e2) x t);
e1p = app e1p (reduce_cbv e2);
end;
reduce_cbv x = x;
```

Err, maybe not *that* different. The differences, in fact, are just what
I pointed out above: arguments to functions are evaluated *before* being
passed to the function. This is the “strict” or “eager” evaluation strategy.

## Other strategies

These are the two strategies most used in programming languages. (That’s a lie
– call by name is often changed a bit and referred to as call by
need. Whatever.) But there’s
a problem with both of these. Or maybe a feature? Think of it how you will,
but notice that *neither strategy reduces terms inside lambda abstractions*.
Again, that makes sense: when we define a function, we don’t expect stuff to
happen until we call it. But when we’re toying around with the lambda
calculus as a model of computation, it’s not good enough. Here’s an example.
We define the Church
numerals and
look at some reductions.

```
lam_zero = lam f (lam z z);
lam_succ = lam n (lam g (lam s (app g (app (app n g) s))));
church_numeral 0 = lam_zero;
church_numeral n = app lam_succ (church_numeral (n-1));
```

The first two are lambda terms for zero and successor, respectively, and then
a function `church_numberal`

that creates the lambda term for `n`

by
repeatedly applying the successor function to zero. What happens when we
reduce, say, `church_numeral 3`

using the `reduce_cbv`

strategy? We get

```
lam a (lam b (app a (app (app
(lam c (lam d (app c (app (app
(lam e (lam g (app e (app (app
(lam f (lam z z)) e) g)))) c) d)))) a) b)))
```

Quite a monstrosity, and difficult to tell that it is, indeed, the church numeral for 3. On the other hand, if we allow reduction of terms inside lambda abstractions, we get

```
lam f (lam x (app f (app f (app f x))))
```

This is the first argument `f`

applied to the second 3 times, which is
manifestly the Church numeral for 3.

We can get this simpler term by using either *normal order reduction* or
*applicative order reduction*.

### Normal and Applicative order reduction

Normal order reduction works kinda like call by name. (In fact, you can see
that it calls our `reduce_cbn`

function at one point.) Again, the main
difference is that terms inside lambda abstractions are reduced.

```
reduce_normal (lam x t) = lam x (reduce_normal t);
reduce_normal (app e1 e2) = case reduce_cbn e1 of
lam x t = reduce_normal (subst e2 x t);
e1p = app (reduce_normal e1p) (reduce_normal e2);
end;
reduce_normal x = x;
```

Applicative order reduction, on the other hand, is similar to call by value:

```
reduce_ao (lam x t) = lam x (reduce_ao t);
reduce_ao (app e1 e2) = case reduce_ao e1 of
lam x t = reduce_ao (subst (reduce_ao e2) x t);
e1p = app e1p (reduce_ao e2);
end;
reduce_ao x = x;
```

I’ll leave it to wikipedia or the lecture notes above to explain the rationale behind these, or prose description rather, as you can see exactly what they do from the code.

## So are there places call by value/name makes a difference?

Yes. Let `omega = lam x (app x x)`

and consider the term `(app (lam x a) (app omega omega))`

. If we use `reduce_cbn`

(call by name), this term evaluates
to `a`

. On the other hand, call by value reduction doesn’t terminate. This
is because it tries to evaluate the argument to the first lambda, which is the
famous \(\Omega\) term that has no normal form. Similarly, `reduce_normal`

reduces this term to `a`

, but `reduce_ao`

(applicative order) gets stuck.

There are way way lots a bunch more theorems about normal forms, uniqueness, and so on, and I’ll leave you to read about them if you’re interested. A good (but slightly intimidating) reference is “The Lambda Calculus: Its Syntax and Semantics” by Barendregt.

## Fixed point combinators

Just to check our work, and give our system a little workout at the same time,
let’s play with the most common fixed point combinator, usually called `Y`

and
defined as

```
Y = lam f (app (lam y (app f (app y y))) (lam x (app f (app x x))));
```

This is a weird beast, and I’m not even going to attempt an explanation here.
But we can do some cool stuff since Pure is a term rewriting language: the
crucial property that `Y`

should satisfy is that for a lambda term `g`

, `Y g`

is a fixed point of `g`

. The great thing is that Pure doesn’t care whether we
have a lambda term or not; we can choose a symbol (say, umm, `g`

, sounds like
a good choice) and pretend it’s a big old lambda term. Let’s give it a shot:
`reduce_cbn (app Y g)`

gives

```
app g (app (lam x (app g (app x x))) (lam x (app g (app x x))))
```

and a quick look at the Wikipedia fixed point combinators
section
(or a short calculation, which is probably more instructive and character
building) indicates that this is, indeed `g (Y g)`

. Wow! Nifty!

Again, you can check out the code for this blog post at github. Be warned: there’s more to it than I’ve shown you. In particular, substitution is hairy. I think I might write a screed about substitution in math, logic and so forth. It’s crucially important, trivial to think about, but oh my goodness is it a bear to work with when programming. Hopefully I haven’t botched it too badly.

Approx. 1657 words, plus code