I dig the lambda calculus, and there’s a special place in my heart for the untyped calculus because of Scott’s amazing semantic interpretation, denotational semantics, and all that awesome jazz. But I have a confession: I don’t know a lot about reduction strategies for terms in the untyped lambda calculus. I wanted to remedy that (at least to some extent…), and I accidentally stumbled on a cool language to do it in. Here’s what I learned.
You can check out the code for this blog post at github.
Term rewriting and the Pure language
For reasons completely unrelated to lambda calculus, I had stumbled upon the Pure term rewriting language. This is a nifty little piece of work, I really like it. The idea is very simple: you supply “rules” for rewriting terms, and the system applies those rules to input. If you’re smart and careful about the rules you chose, you can do nifty stuff (like compute anything a computer can compute). The code looks a lot like code from a functional language; Haskell comes to mind. But there are very big, very important differences. It’s simple and crafty and hard to describe all at the same time, so let’s look at some examples.
You can write functions that look like, you know, functions in any old language. Say a function that squares its argument and adds one:
square x = x * x + 1;
An example invocation would be
square 3, and if you ran that in the REPL it
would tell you
10. But there’s more going on here: there are some built in
rewriting rules that allow the system to evaluate the expression
3 * 3 + 1.
In fact, being a term rewriting language, Pure doesn’t care if the argument to
square is a number; it could be a symbol:
square a returns
Well that’s interesting! You can see that this might make for the beginnings
of a powerful symbolic math package.
Anyway that’s all stuff for another day. I want to know more about the
untyped lambda calculus, and specifically about reduction
So let’s set up a little lambda calculus language in Pure. I’m not going to
go over basic stuff here, so read up on wikipedia if you need a refresher.
Recall that terms in the untyped lambda calculus are either variables,
applications of one term to another, or lambda abstraction over a term. We’ll
represent these in Pure with symbols,
app M N, and
lam x E, respectively.
The term \(\lambda f.\lambda x. f (f x)\) is written in this notation as
lam f (lam x (app f (app f x)))
Much of the reduction code is taken from these lecture notes. The two main strategies we tend to be familiar with from programming languages are call by value and call by name. These are sometimes associated with strict and lazy languages, respectively, but I’m not sure there’s exactly a bijective correspondence.
Call by name
Anyhow, reduction of terms for the call by name version looks like this:
reduce_cbn (lam x t) = lam x t; reduce_cbn (app e1 e2) = case reduce_cbn e1 of lam x t = reduce_cbn (subst e2 x t); e1p = app e1p e2 end; reduce_cbn x = x;
There are three rules here; the first says that we don’t do anything to lambda abstractions. That makes sense; in a programming language, when we define a function, nothing really happens until we call the function.
Calling functions is the subject of the second rule: when we reduce an
application, we first reduce the head (the thing we’ll be applying to its
argument). If that’s a lambda abstraction, then awesome: we can apply the
function to its argument, reduce the result, and that’s that. The application
is actually performed by the function
subst, which takes a term, a variable,
and a function body, returning the body with all instances of the variable
replaced with the term.
The third rule just says that there’s nothing to do to reduce a variable.
Note that in the call by name strategy, the argument in the application is not reduced beforehand. This is the “lazy” part – terms are only evaluated if necessary.
Call by value
The call by value reduction looks a bit different:
reduce_cbv (lam x t) = lam x t; reduce_cbv (app e1 e2) = case reduce_cbv e1 of lam x t = reduce_cbv (subst (reduce_cbv e2) x t); e1p = app e1p (reduce_cbv e2); end; reduce_cbv x = x;
Err, maybe not that different. The differences, in fact, are just what I pointed out above: arguments to functions are evaluated before being passed to the function. This is the “strict” or “eager” evaluation strategy.
These are the two strategies most used in programming languages. (That’s a lie – call by name is often changed a bit and referred to as call by need. Whatever.) But there’s a problem with both of these. Or maybe a feature? Think of it how you will, but notice that neither strategy reduces terms inside lambda abstractions. Again, that makes sense: when we define a function, we don’t expect stuff to happen until we call it. But when we’re toying around with the lambda calculus as a model of computation, it’s not good enough. Here’s an example. We define the Church numerals and look at some reductions.
lam_zero = lam f (lam z z); lam_succ = lam n (lam g (lam s (app g (app (app n g) s)))); church_numeral 0 = lam_zero; church_numeral n = app lam_succ (church_numeral (n-1));
The first two are lambda terms for zero and successor, respectively, and then
church_numberal that creates the lambda term for
repeatedly applying the successor function to zero. What happens when we
church_numeral 3 using the
reduce_cbv strategy? We get
lam a (lam b (app a (app (app (lam c (lam d (app c (app (app (lam e (lam g (app e (app (app (lam f (lam z z)) e) g)))) c) d)))) a) b)))
Quite a monstrosity, and difficult to tell that it is, indeed, the church numeral for 3. On the other hand, if we allow reduction of terms inside lambda abstractions, we get
lam f (lam x (app f (app f (app f x))))
This is the first argument
f applied to the second 3 times, which is
manifestly the Church numeral for 3.
We can get this simpler term by using either normal order reduction or applicative order reduction.
Normal and Applicative order reduction
Normal order reduction works kinda like call by name. (In fact, you can see
that it calls our
reduce_cbn function at one point.) Again, the main
difference is that terms inside lambda abstractions are reduced.
reduce_normal (lam x t) = lam x (reduce_normal t); reduce_normal (app e1 e2) = case reduce_cbn e1 of lam x t = reduce_normal (subst e2 x t); e1p = app (reduce_normal e1p) (reduce_normal e2); end; reduce_normal x = x;
Applicative order reduction, on the other hand, is similar to call by value:
reduce_ao (lam x t) = lam x (reduce_ao t); reduce_ao (app e1 e2) = case reduce_ao e1 of lam x t = reduce_ao (subst (reduce_ao e2) x t); e1p = app e1p (reduce_ao e2); end; reduce_ao x = x;
I’ll leave it to wikipedia or the lecture notes above to explain the rationale behind these, or prose description rather, as you can see exactly what they do from the code.
So are there places call by value/name makes a difference?
omega = lam x (app x x) and consider the term
(app (lam x a) (app omega omega)). If we use
reduce_cbn (call by name), this term evaluates
a. On the other hand, call by value reduction doesn’t terminate. This
is because it tries to evaluate the argument to the first lambda, which is the
famous \(\Omega\) term that has no normal form. Similarly,
reduces this term to
reduce_ao (applicative order) gets stuck.
There are way way lots a bunch more theorems about normal forms, uniqueness, and so on, and I’ll leave you to read about them if you’re interested. A good (but slightly intimidating) reference is “The Lambda Calculus: Its Syntax and Semantics” by Barendregt.
Fixed point combinators
Just to check our work, and give our system a little workout at the same time,
let’s play with the most common fixed point combinator, usually called
Y = lam f (app (lam y (app f (app y y))) (lam x (app f (app x x))));
This is a weird beast, and I’m not even going to attempt an explanation here.
But we can do some cool stuff since Pure is a term rewriting language: the
crucial property that
Y should satisfy is that for a lambda term
is a fixed point of
g. The great thing is that Pure doesn’t care whether we
have a lambda term or not; we can choose a symbol (say, umm,
g, sounds like
a good choice) and pretend it’s a big old lambda term. Let’s give it a shot:
reduce_cbn (app Y g) gives
app g (app (lam x (app g (app x x))) (lam x (app g (app x x))))
and a quick look at the Wikipedia fixed point combinators
(or a short calculation, which is probably more instructive and character
building) indicates that this is, indeed
g (Y g). Wow! Nifty!
Again, you can check out the code for this blog post at github. Be warned: there’s more to it than I’ve shown you. In particular, substitution is hairy. I think I might write a screed about substitution in math, logic and so forth. It’s crucially important, trivial to think about, but oh my goodness is it a bear to work with when programming. Hopefully I haven’t botched it too badly.
Approx. 1657 words, plus code