Tail Recursion Elimination

April 22, 2009 — originally posted on Neopythonic

I recently posted an entry in my Python History blog on the origins of Python's functional features. A side remark about not supporting tail recursion elimination (TRE) immediately sparked several comments about what a pity it is that Python doesn't do this, including links to recent blog entries by others trying to "prove" that TRE can be added to Python easily. So let me defend my position (which is that I don't want TRE in the language). If you want a short answer, it's simply unpythonic. Here's the long answer:

First, as one commenter remarked, TRE is incompatible with nice stack traces: when a tail recursion is eliminated, there's no stack frame left to use to print a traceback when something goes wrong later. This will confuse users who inadvertently wrote something recursive (the recursion isn't obvious in the stack trace printed), and makes debugging hard. Providing an option to disable TRE seems wrong to me: Python's default is and should always be to be maximally helpful for debugging. This also brings me to the next issue:

Second, the idea that TRE is merely an optimization, which each Python implementation can choose to implement or not, is wrong. Once tail recursion elimination exists, developers will start writing code that depends on it, and their code won't run on implementations that don't provide it: a typical Python implementation allows 1000 recursions, which is plenty for non-recursively written code and for code that recurses to traverse, for example, a typical parse tree, but not enough for a recursively written loop over a large list.

Third, I don't believe in recursion as the basis of all programming. This is a fundamental belief of certain computer scientists, especially those who love Scheme and like to teach programming by starting with a "cons" cell and recursion. But to me, seeing recursion as the basis of everything else is just a nice theoretical approach to fundamental mathematics (turtles all the way down), not a day-to-day tool.

For practical purposes, Python-style lists (which are flexible arrays, not linked lists), and sequences in general, are much more useful to start exploring the wonderful world of programming than recursion. They are some of the most important tools for experienced Python programmers, too. Using a linked list to represent a sequence of value is distinctly unpythonic, and in most cases very inefficient. Most of Python's library is written with sequences and iterators as fundamental building blocks (and dictionaries, of course), not linked lists, so you'd be locking yourself out of a lot of pre-defined functionality by not using lists or sequences.

Last, let's look at how we could implement tail recursion elimination. The first observation is that you can't do it at compile time. I've seen at least one blog entry that used a bytecode hack to replace a CALL opcode immediately before a RETURN opcode with a jump to the top of the function body. This may be a nice demo, but unfortunately Python's compiler cannot reliably determine whether any particular call is actually reference the current function, even if it appears to have the same name. Consider this simple example:

def f(x):
     if x > 0:
            return f(x-1)
    return 0

It looks like you could replace the body with something like this:

if x > 0:
    x = x-1
    <jump to top>
return 0

This seems simple enough, but now add this:

g = f
def f(x):
      return x
g(5)

The call to g(5) invokes the earlier f, but the "recursive" call no longer recurses! At run-time, the name 'f' is rebound to the later non-recursive definition, so the returned value is 4, not 0. While I agree that this particual example is bad style, it is a well-defined part of Python's semantics that has plenty of legitimate uses, and a compiler that made this replacement in the optimistic hope that f's definition will remain unchanged would introduce enough bugs in real-world code to cause an outrage.

Another blog post showed decorators that can be used to implement tail recursion using magical exceptions or return values. These can be written in plain Python (though that post shows an optimized Cython version that is claimed to be "only 10% slower", though it doesn't seem to be thread-safe). If this tickles your fancy I won't try to stop you, but I would strongly object against inclusion of something like this in the core distribution: there are many caveats to the use of such a decorator, since it has to assume that any recursive call (in the decorated function) is tail-recursive and can be eliminated. In the hands of less experienced users this could easily lead to disasters. For example, the common recursive definition of factorial is not tail-recursive:

def fact(n):
      if n > 1:
            return n * fact(n-1)
      return 1

There are also plenty of functions that contain a tail-recursive call and another recursive call that isn't tail-recursive; the decorators don't handle such cases. Another subtlety that those decorators don't handle is tail-recursive calls inside try blocks: these may look like they could be eliminated, but they can't, because TRE could remove the exception handling which is guaranteed by the language. For all these reasons I think the decorator approach is doomed, at least for a general audience.

Still, if someone was determined to add TRE to CPython, they could modify the compiler roughly as follows. First, determine "safe" tail-recursive call sites. This would be something like a CALL opcode immediately followed by a RETURN opcode, and completely outside any try blocks. (Note: I'm ignoring the different CALL_* opcodes, which should be easy enough to handle using the same approach.) Next, replace each such CALL-RETURN opcode pair with a single CALL_RETURN opcode. There's no need for the compiler to try and check if the name of the function being called is the same as the current function: the new opcode can represent savings for all CALL+RETURN combinations merely by saving the time to decode a second opcode. If at run time the determination is made that this particular call is not applicable for TRE, the usual actions for a CALL followed by a RETURN opcode are carried out. (I suppose you could add some kind of caching mechanism indexed by call site to speed up the run-time determination.)

In the determination of wheter TRE can be applied, there are several levels of aggressiveness that you could apply. The least aggressive, "vanilla" approach would only optimize the call if the object being called is the function that is already running in the current stack frame. All we have to do at this point is clear the locals out of the current stack frame (and other hidden state like active loops), set the arguments from the evaluation stack, and jump to the top. (Subtlety: the new arguments are, by definition, in the current stack frame. It's probably just a matter of copying them first. More subtleties are caused by the presence of keyword arguments, variable length argument lists, and default argument values. It's all a simple matter of programming though.)

A more aggressive version would also recognize the situation where a method is tail recursive (i.e. the object being called is a bound method whose underlying function is the same as the one in the current stack frame). This just requires a bit more programming; the CPython interpreter code (ceval.c) already has an optimization for method calls. ( I don't know how useful this would be though: I expect the tail recursive style to be popular with programmers who like to use a functional programming style overall, and would probably not be using classes that much. :-)

In theory, you could even optimize all cases where the object being called is a function or method written in Python, as long as the number of local variables needed for the new call can be accommodated in the current stack frame object. (Frame objects in CPython are allocated on the heap and have a variable allocation size based on the required space for the locals; there is already machinery for reusing frame objects.) This would optimize mutually tail-recursive functions, which otherwise wouldn't be optimized. Alas, it would also disable stack traces in most cases, so it would probably not be a good idea.

A more benign variant would be to create Python-level stack frames objects just like before, but reuse the C stack frame. This would create an approximation of Stackless Python, though it would still be easy enough to run out of C stack by recursing through a built-in function or method.

Of course, none of this does anything to address my first three arguments. Is it really such a big deal to rewrite your function to use a loop? (After all TRE only addresses recursion that can easily be replaced by a loop. :-)

Comments (25)

namekuseijin — April 23, 2009

Damn you <stdin>! :P

Jason Baker — April 24, 2009

Just out of sheer curiosity, is there any reason why you could not do something similar to clojure's loops? I suppose the syntax could look something like this:

def some_func():
loop(x=0):
if x <= 10:
x += 1
recur(x, y)

That would roughly translate into something like this:

def some_func():
def _loop(x):
if x <= 10:
x += 1
_loop(x, y)

_loop(0)

Granted, this is a contrived case. But sometimes I find it easier to think of problems recursively rather than iteratively. And the second version just seems ugly to me.

kay schluehr — April 25, 2009

@jto

Writing a virtual machine using opcodes ( states ) and a stack is just fine. Writing applications in that style is simply not adequate and a step backwards.

EM — April 26, 2009

This comment has been removed by the author.

brad dunbar — April 26, 2009

Not having a lot of experience or knowledge of tail recursion, I am not in any position to argue about using it or not using it. I love python the way it is, and I trust you're making the right decisions.

However, a point that does bother me slightly is your general preference for loops in place of recursion. I feel like recursion is closer to the way my mind reasons about problems in many cases. And since programming languages are just a way of translating my thoughts for the cpu, it seems like a handy tool.

In any case, I would like to hear more on your preferences regarding recursion vs. loops and am hoping for another post exploring this.

I'm loving the posts here and at History of Python. Keep them up. :)

Aaron — April 27, 2009

>> The call to g(5) invokes the
>> earlier f, but the "recursive"
>> call no longer recurses!

This is, of course, why scheme doesn't generally do tail call elimination this way either.

It is interesting in that it can be done without any modification to the byte code interpreter, but so can trampoline loops like is employed in Clojure.

But why the avoidance of tampering with the byte code interpreter?

Faré — April 27, 2009

1- your proposed alternative to tail-calls, loops, are WORSE wrt debug info. Tail-calls are trivially transformed into calls w/ debugging info; equivalent loops not so much. http://fare.livejournal.com/142410.html

2- PTC (proper tail calls) are not a mere optimization. They essentially change the space complexity of programs, and they allow code to be decentralized where loops require big spaghetti centralization.

3- lisp-style lists and recursion are completely different concepts. One does not imply the other. Tail-calls allow to build iterators on arbitrary data-structures in a modular and compositional way. (Felleisen has some nice papers on that.)

4- any Turing tar-pit is able to emulate proper tail-call through one extra level of indirection. But then in addition to the run-time cost, you pay the programmer-time cost of having to enforce convention and not being able to freely interoperate with code that doesn't use the convention.

nicolaslara — April 29, 2009

"Is it really such a big deal to rewrite your function to use a loop? "

It is not. But I'd have to say it would definitely affect readability in the case of self recursive functions.

An interesting way to address this would be to use coroutines and instead of calling the function recursively one could send the arguments to the function. But then again, coroutines cant send values to themselves AFAIK.
If they did, one could use a pattern like this:

def tail_rec_func():
args, kwargs = (yield)
...
send(new_args, new_kwargs)

The good thing about this is that it is explicit. It would be a fairly advanced trick and people would know what to expect from it (i.e.: a non-trivial stack trace).

Another way would be through new syntax (something different than return) for making a tail call (I know adding syntax is not to be taken lightly, please don't set me on fire). Someone mentioned Clojure which has similar syntax.

Jayson Vantuyl — April 29, 2009

I actually find the best use case to be when you have functions that aren't just self-recursive. If you have two or three functions that recurse between each other, suddenly you have an unbounded stack problem.

I would love a syntax change. I had previously proposed "recurse", but I like send even better. If we do add syntax, I think that we should add an API. Callables should support a __send__ method that does optimized tail recursion.

The advantage of this is that we can write classes that can support __send__ and use them in concert with things like Threading to build much better actor systems.

What would be REALLY cool would be if generators used this API as well (since X.send(blah) is not quite as nice as send X(blah) ). Although, there is the problem that we are clouding the meaning what would otherwise looks like a normal function call, so it probably would be too confusing.

On the plus side, it would be very cool to have a subclass of threading.Thread that wraps a generator and will yield incoming "send" messages into generator, etc.

In the end, I'm pretty sure that to get a really good actor paradigm, we need to go further than just generators, probably with a whole new construct. Lots of things that work great in an actor model are the kind of thing that having this kind of feature is really good for.

FutureNerd — May 31, 2009

Why not make a cleaned-up tail call an explicit variation on the return statement? Then you get the stack trace you asked for. And like what Kumar says, it's also the stack trace you probably would want in that situation.

I thought I read that Python was intended for both beginners and math types. I like Python because both those sides of myself like it.

The application that led me eventually to here, was a generator for a directory tree. It just seems wrong for each [[great...] grand] parent to loop over each [[great...] grand] child's results and re- [re-...] yield them.

I googled about that and found a discussion that said a tail-recursive "yield_all" would be very hard in Python.

Then I started trying to write an iterator subclass that does explicit continuation passing. My first try ended up having its space savings depend on a tail-recursive method call. Hmm, does that work in Python? Googling that question got me here. (No it doesn't work, back to the drawing board.)

Meanwhile, Guido, you wondered what possible use a tail- recursive method call could have. Hope this seems plausible. Imagine trees with >1000- deep chains of only children, but also bushy parts.

There are ways to do TCO in pure stack languages. What it does to the readability of your code and how it integrates with things like external C functions, are different matters.

Beni Cherniavsky-Paskin — June 03, 2009

Not Python-related, I just figured the perfect example of tail call for the layman, and had to share it with somebody:

You know how you have a browser tab whose only interesting content is 5 links, so you middle-click 4 of them to new tabs but open the 5th link in the original tab? That's tail call!

[Strictly speaking, that's imprecise because you have history (=stack) on each tab. So it's more like "tail spawn" for threads or processes: it's like some launcher shell scripts whose last line is "exec" (*). But coding-wise, in the abstraction level of a shell script, it looks exactly like tail call - because "calls" in the the shell are asynchronous.

(*) ironically, the reason launcher scripts use "exec" is precisely that they *want* to drop the "stack frame". Leaving the shell alive would just clatter the process table with no real benefit.]

david pasquinelli — May 15, 2011

@Adam Olsen
"it changes the algorithmic complexity..."

how does tco change algorithmic complexity?

Anirvan — January 02, 2012

As a newcomer to Python, Guido's example with function name rebinding is a bit scary to me...

How can I refer to a function from within the body of that function without using its name to prevent this? I haven't been able to find an equivalent of "self" for functions.

Chris Drost — March 13, 2012

Anirvan: sorry that I am late here. Its true that python does not have private function naming the way JavaScript does, and therefore you would have to write something like (replace ` with two spaces):

def make_fact():
``def fact(n):
````return 1 if n < 2 else n * fact(n - 1)
``return fact
fact = make_fact()

It is also true that unfortunately Python, like JavaScript, doesn't support macros (code that writes code), so you cannot wrap that whole beast in a simple decorator. It's a pity, it would be nice to be able to at the very least say something like:

@!self_scoped
def fact(n):
``return 1 if n < 2 else n * fact(n - 1)

...with `@!` being a sort of "macro decorator" so that you do not have to worry if someone else redefines fact. On the other hand, there is a Python principle that "we're all adults here, if I redefine 'fact' within your code that's only really going to affect me," etc. Still, you can see where it lacks power.

Beni: I am even later in responding to you, but I will make this point anyways. The right way to think about tail calls is 'iterorecursively'. You write an iterative algorithm by declaring the variables as arguments and recursing rather than looping. So you might have written an iterative factorial as:

def fact(n):
``product = 1
``for i in range(1, n + 1):
````product *= i
``return product

To write this iterorecursively you could instead write:

def fact(n, product=1, i=1):
``return product if i > n else fact(n, product * i, i + 1)

That is a tail-recursive function. Many lispers get so used to them that they instead decrement on `n` to save a variable:

def fact(n, product=1):
``return product if n <= 0 else fact(n - 1, product * n)

The point is, GvR's advice here is mostly blustery crap. If I use @!tail_recursive (again with these fake macros that don't exist!), then the very least you can do is assume the @!self_scoped macro, too: whenever I refer to `name` it means the present function. The fact that the macro depends on this doesn't matter, because We Are All Adults Here, and if we say "I'm doing X" then we should be content to say "oh, look, he's doing X." The other problem that GvR highlights -- that sometimes it's syntactically incorrect to describe something as @!tail_recursive -- is probably the domain of throwing a ValueError. However, it is suggestive of the fact that we shouldn't even have to discuss this sort of thing, and tail call optimization can just happen for free on compilation. If you look at the Abelson-Sussman lectures here:

http://groups.csail.mit.edu/mac/classes/6.001/abelson-sussman-lectures/

they discuss compiler design in Scheme, and the result is actually absurdly simple: there is a tiny, almost trivial difference where you realize that it doesn't matter which order two lines of compiler output are, so you put them in the opposite order that you might have expected, and then you don't leave behind any stack when the thing actually is tail-recursive, but you still gracefully handle the case where it is not (since you haven't lost generality).

So at least in principle Guido is wrong. Scheme proves that you don't need a special syntax to provide tail-call optimization, and I'm really not sure that things like debuggability fundamentally can't be done in this context.

Doug Blank — February 11, 2013

You can absolutely have Python-like stack traces in a properly tail-call handled language. Take:

(define loop
````(lambda (n)
```````(if (= n 0)
```````````(no-such-function)
```````````(loop (- n 1)))))
(loop 10)

Produces:

Traceback (most recent call last):
File "/home/dblank/gluezilla/Untitled.ss", line 7, col 1, calling 'loop'
File "/home/dblank/gluezilla/Untitled.ss", line 5, col 13, calling 'loop'
File "/home/dblank/gluezilla/Untitled.ss", line 5, col 13, calling 'loop'
File "/home/dblank/gluezilla/Untitled.ss", line 5, col 13, calling 'loop'
File "/home/dblank/gluezilla/Untitled.ss", line 5, col 13, calling 'loop'
File "/home/dblank/gluezilla/Untitled.ss", line 5, col 13, calling 'loop'
File "/home/dblank/gluezilla/Untitled.ss", line 5, col 13, calling 'loop'
File "/home/dblank/gluezilla/Untitled.ss", line 5, col 13, calling 'loop'
File "/home/dblank/gluezilla/Untitled.ss", line 5, col 13, calling 'loop'
File "/home/dblank/gluezilla/Untitled.ss", line 5, col 13, calling 'loop'
File "/home/dblank/gluezilla/Untitled.ss", line 5, col 13, calling 'loop'
File "/home/dblank/gluezilla/Untitled.ss", line 4, col 14
RunTimeError: unbound variable 'no-such-function'

in Calico Scheme.

Anonymous — July 11, 2013

Definitely seems to be complicated/impossible to determine a function is tail recursion 'compliant' statically in python, however, what if it were an 'opt in' feature that uses a different 'return' keyword?

def f(n):
if n > 0:
tailcall f(n - 1)
return 0

Where tailcall would have special semantics similar to return but with the side effect of 'Tail Recursion'. You could easily use return here, but in the case where n is very large, you could replace the return with tailcall to 'fix the bug' vs rewriting the entire function to be iteration vs recursion. Arguably, the final effect of return and tailcall is the same (a value is returned), but the process in which you get there is significantly different. One results in a deep stack that possibly blows up vs the other results in a single frame that is harder to debug (but I think that could be mitigated, for instance: a tailcall frame/function could be specially marked in the stack trace).

Arguably, it would add yet another 'return-like' keyword (where yield and return are similar), but perhaps its simple and un-intrusive enough to benefit those that would like to use tail recursion.

Obviously, it's not as glamorous as being able to optimize 'return' to tail recursion automagically, but it does allow those familiar with its semantics to opt into its use. Other languages could easily implement it and it would not affect already written code.

Just had the thought, not sure if it has already been discussed. I know python does not attempt to be a functional language, but this would certainly open it up to more of a possibility.

Thanks for your time.

Guido van Rossum — July 11, 2013

Dan: your proposal has the redeeming quality of clearly being a language feature rather than a possible optimization. I don't really expect there to be enough demand to actually add this to the language though. Maybe you can use macropy to play around with the idea though?

Anonymous — July 17, 2013

I like Dan's proposal very much. I would make use of the new language feature immediately!

水逝云飞 — October 10, 2013

TRE is explicitly enabled by the special return syntax(for example, return from ...).

with the new "return from", current frames is reused, and traceback is replaced by the further call.

using "return from" to explicitly release current frame provide more efficient gc (local variables die as soon as they were useless) and more clearly debug information (traceback of some wrapper function is totally useless, such as socket.socket.__getattr__).

水逝云飞 — October 10, 2013

just use a new syntax(for example, return from ...) for explicitly reuse current frame, and everything done.

explicitly release current frame on next function call could provide more efficient gc(local variables die as soon as they were useless) and more clean debug information(traceback of many wrapper function is totally useless, such as socket.socket.__getattr__).

ipatrol — April 01, 2014

My thoughts are to perhaps make TCO an opt-in feature via -O/-OO or something analogous. In that case, it would be the responsibility of the invoker of the interpreter to ensure that the program plays nice and doesn't try any bizarre tricks, just like right now one needs to make sure under -OO that docstrings aren't being used (I'm looking at you, ply). Also, under optimize mode, the neatness of stack traces is presumably irrelevant because a known side effect of optimization is the removal of debugging features.

ArneBab — November 18, 2014

I’ve grown to like tail recursion even though I “grew up” with Python (it was the first language I really liked). My reasons are that I found recursive functions easier to debug than loops, because the variables available at the beginning of each “iteration” are always clear. I have an example of this on my site: http://draketo.de/light/english/recursion-wins

Also it is already possible to get nice stack-traces with GCC by requiring compilation with -Og -foptimize-sibling-calls. For details see http://draketo.de/light/english/free-software/tco-debug

Anonymous — July 04, 2015

In a project we are working on, we are moving toward a more and more functional style, because it is much easier to reason about and this "feature" made use decide to move away from python. In python 3 functional programming style is even more discouraged.

I don't understand this decision. The debugging issue is a non issue, so what you lose some frames? It is not, that they have any interesting information in it. (f called with (3), f called with 2). And ehm, if you are really concerned about loosing information, it could be added as an optimization option to python.

Anonymous — July 04, 2015

It is much cleaner to work with combinators than with explicit recursion. It shows the intent of the programmer.

Lucas Malor — February 28, 2016

<>

Well, the same happens with asyncio. You can't get a nice stack trace if you don't enable asyncio debugging mode.