Software! Math! Data! The blog of R. Sean Bowman
The blog of R. Sean Bowman
May 08 2016

I dig the lambda calculus, and there’s a special place in my heart for the untyped calculus because of Scott’s amazing semantic interpretation, denotational semantics, and all that awesome jazz. But I have a confession: I don’t know a lot about reduction strategies for terms in the untyped lambda calculus. I wanted to remedy that (at least to some extent…), and I accidentally stumbled on a cool language to do it in. Here’s what I learned.

You can check out the code for this blog post at github.

Term rewriting and the Pure language

For reasons completely unrelated to lambda calculus, I had stumbled upon the Pure term rewriting language. This is a nifty little piece of work, I really like it. The idea is very simple: you supply “rules” for rewriting terms, and the system applies those rules to input. If you’re smart and careful about the rules you chose, you can do nifty stuff (like compute anything a computer can compute). The code looks a lot like code from a functional language; Haskell comes to mind. But there are very big, very important differences. It’s simple and crafty and hard to describe all at the same time, so let’s look at some examples.

You can write functions that look like, you know, functions in any old language. Say a function that squares its argument and adds one:

square x = x * x + 1;

An example invocation would be square 3, and if you ran that in the REPL it would tell you 10. But there’s more going on here: there are some built in rewriting rules that allow the system to evaluate the expression 3 * 3 + 1. In fact, being a term rewriting language, Pure doesn’t care if the argument to square is a number; it could be a symbol: square a returns a*a+1. Well that’s interesting! You can see that this might make for the beginnings of a powerful symbolic math package.

Lambda calculus

Anyway that’s all stuff for another day. I want to know more about the untyped lambda calculus, and specifically about reduction strategies. So let’s set up a little lambda calculus language in Pure. I’m not going to go over basic stuff here, so read up on wikipedia if you need a refresher. Recall that terms in the untyped lambda calculus are either variables, applications of one term to another, or lambda abstraction over a term. We’ll represent these in Pure with symbols, app M N, and lam x E, respectively. The term \(\lambda f.\lambda x. f (f x)\) is written in this notation as

lam f (lam x (app f (app f x)))

Much of the reduction code is taken from these lecture notes. The two main strategies we tend to be familiar with from programming languages are call by value and call by name. These are sometimes associated with strict and lazy languages, respectively, but I’m not sure there’s exactly a bijective correspondence.

Call by name

Anyhow, reduction of terms for the call by name version looks like this:

reduce_cbn (lam x t) = lam x t;
reduce_cbn (app e1 e2) = case reduce_cbn e1 of
    lam x t = reduce_cbn (subst e2 x t);
    e1p = app e1p e2
reduce_cbn x = x;

There are three rules here; the first says that we don’t do anything to lambda abstractions. That makes sense; in a programming language, when we define a function, nothing really happens until we call the function.

Calling functions is the subject of the second rule: when we reduce an application, we first reduce the head (the thing we’ll be applying to its argument). If that’s a lambda abstraction, then awesome: we can apply the function to its argument, reduce the result, and that’s that. The application is actually performed by the function subst, which takes a term, a variable, and a function body, returning the body with all instances of the variable replaced with the term.

The third rule just says that there’s nothing to do to reduce a variable.

Note that in the call by name strategy, the argument in the application is not reduced beforehand. This is the “lazy” part – terms are only evaluated if necessary.

Call by value

The call by value reduction looks a bit different:

reduce_cbv (lam x t) = lam x t;
reduce_cbv (app e1 e2) = case reduce_cbv e1 of
    lam x t = reduce_cbv (subst (reduce_cbv e2) x t);
    e1p = app e1p (reduce_cbv e2);
reduce_cbv x = x;

Err, maybe not that different. The differences, in fact, are just what I pointed out above: arguments to functions are evaluated before being passed to the function. This is the “strict” or “eager” evaluation strategy.

Other strategies

These are the two strategies most used in programming languages. (That’s a lie – call by name is often changed a bit and referred to as call by need. Whatever.) But there’s a problem with both of these. Or maybe a feature? Think of it how you will, but notice that neither strategy reduces terms inside lambda abstractions. Again, that makes sense: when we define a function, we don’t expect stuff to happen until we call it. But when we’re toying around with the lambda calculus as a model of computation, it’s not good enough. Here’s an example. We define the Church numerals and look at some reductions.

lam_zero = lam f (lam z z);
lam_succ = lam n (lam g (lam s (app g (app (app n g) s))));

church_numeral 0 = lam_zero;
church_numeral n = app lam_succ (church_numeral (n-1));

The first two are lambda terms for zero and successor, respectively, and then a function church_numberal that creates the lambda term for n by repeatedly applying the successor function to zero. What happens when we reduce, say, church_numeral 3 using the reduce_cbv strategy? We get

lam a (lam b (app a (app (app
  (lam c (lam d (app c (app (app
    (lam e (lam g (app e (app (app
      (lam f (lam z z)) e) g)))) c) d)))) a) b)))

Quite a monstrosity, and difficult to tell that it is, indeed, the church numeral for 3. On the other hand, if we allow reduction of terms inside lambda abstractions, we get

lam f (lam x (app f (app f (app f x))))

This is the first argument f applied to the second 3 times, which is manifestly the Church numeral for 3.

We can get this simpler term by using either normal order reduction or applicative order reduction.

Normal and Applicative order reduction

Normal order reduction works kinda like call by name. (In fact, you can see that it calls our reduce_cbn function at one point.) Again, the main difference is that terms inside lambda abstractions are reduced.

reduce_normal (lam x t) = lam x (reduce_normal t);
reduce_normal (app e1 e2) = case reduce_cbn e1 of
    lam x t = reduce_normal (subst e2 x t);
    e1p = app (reduce_normal e1p) (reduce_normal e2);
reduce_normal x = x;

Applicative order reduction, on the other hand, is similar to call by value:

reduce_ao (lam x t) = lam x (reduce_ao t);
reduce_ao (app e1 e2) = case reduce_ao e1 of
    lam x t = reduce_ao (subst (reduce_ao e2) x t);
    e1p = app e1p (reduce_ao e2);
reduce_ao x = x;

I’ll leave it to wikipedia or the lecture notes above to explain the rationale behind these, or prose description rather, as you can see exactly what they do from the code.

So are there places call by value/name makes a difference?

Yes. Let omega = lam x (app x x) and consider the term (app (lam x a) (app omega omega)). If we use reduce_cbn (call by name), this term evaluates to a. On the other hand, call by value reduction doesn’t terminate. This is because it tries to evaluate the argument to the first lambda, which is the famous \(\Omega\) term that has no normal form. Similarly, reduce_normal reduces this term to a, but reduce_ao (applicative order) gets stuck.

There are way way lots a bunch more theorems about normal forms, uniqueness, and so on, and I’ll leave you to read about them if you’re interested. A good (but slightly intimidating) reference is “The Lambda Calculus: Its Syntax and Semantics” by Barendregt.

Fixed point combinators

Just to check our work, and give our system a little workout at the same time, let’s play with the most common fixed point combinator, usually called Y and defined as

Y = lam f (app (lam y (app f (app y y))) (lam x (app f (app x x))));

This is a weird beast, and I’m not even going to attempt an explanation here. But we can do some cool stuff since Pure is a term rewriting language: the crucial property that Y should satisfy is that for a lambda term g, Y g is a fixed point of g. The great thing is that Pure doesn’t care whether we have a lambda term or not; we can choose a symbol (say, umm, g, sounds like a good choice) and pretend it’s a big old lambda term. Let’s give it a shot: reduce_cbn (app Y g) gives

app g (app (lam x (app g (app x x))) (lam x (app g (app x x))))

and a quick look at the Wikipedia fixed point combinators section (or a short calculation, which is probably more instructive and character building) indicates that this is, indeed g (Y g). Wow! Nifty!

Again, you can check out the code for this blog post at github. Be warned: there’s more to it than I’ve shown you. In particular, substitution is hairy. I think I might write a screed about substitution in math, logic and so forth. It’s crucially important, trivial to think about, but oh my goodness is it a bear to work with when programming. Hopefully I haven’t botched it too badly.

Approx. 1657 words, plus code