Tail Call Optimisation in Common Lisp Implementations
While the Common Lisp standard does not require implementations to eliminate tail calls, some do—thereby allowing a more functional programming style. Especially in the prototyping phase, it can be useful to know where this functional style can be employed and where it's likely to cause stack overflow. Since I couldn't find a single document outlining support for tail call optimisation, I've written it.
In what follows, I distinguish between Tail Call Optimisation (TCO)—where all calls in tail position are optimised into jumps—and Tail Self-Call Optimisation, a special case of TCO where only self-calls are optimised (this is usually what people want when they refer to TCO as it allows for the recursive style). TCO is sometimes referred to as proper tail recursion.
This document is still incomplete; feedback would be appreciated (especially where claims are just outright wrong!).
- Nice to haves: [0/2]
[ ]
Demonstration of trampoline code;[ ]
Discussion of optimisation declarations and their effects on each implementation.
Table of Contents
1 Findings
The following implementations were found to provide full TCO:
- CMUCL
- SBCL
- CCL
- Allegro
- LispWorks
…only optimise self-calls in tail position:
- CLisp
- GCL
- ECL (when explicitly compiled)
…neither:
- ABCL
General points:
- Allegro's documentation is the most explicit.
- Most (if not all) explanations refer to disassembly listings for demonstration. YMMV.
2 Implementation Details
2.1 DONE Cmucl
- TCO
- yes
2.1.1 Documentation
From http://common-lisp.net/project/cmucl/doc/cmu-user/debugger.html#toc98,
Both the compiler and the interpreter are ``properly tail recursive.'' If a function call is in a tail-recursive position, the stack frame will be deallocated at the time of the call, rather than after the call returns.
"Properly tail recursive" refers to all calls in tail position as is made clear below.
2.1.2 DONE Demonstration
Using: CMU Common Lisp 20b (20B Unicode)
Let's take a look at some disassembled functions. First up, tail recursion:
* (defun foo () (foo)) FOO * (disassemble 'foo) ; Compiling LAMBDA NIL: ; Compiling Top-Level Form: 480B7900: .ENTRY "LAMBDA NIL"() ; (FUNCTION NIL *) 18: POP DWORD PTR [EBP-8] 1B: LEA ESP, [EBP-32] 1E: TEST ECX, ECX ; [:NON-LOCAL-ENTRY] 20: JNE L0 22: MOV EAX, [#x480B78F8] ; #<FDEFINITION object for FOO> ; No-arg-parsing entry point ; [:NON-LOCAL-ENTRY] 28: XOR ECX, ECX 2A: PUSH DWORD PTR [EBP-8] ;;; [3] (FOO) 2D: JMP DWORD PTR [EAX+5] ; [:CALL-SITE] 30: L0: BREAK 10 ; Error trap 32: BYTE #x02 33: BYTE #x1A ; INVALID-ARGUMENT-COUNT-ERROR 34: BYTE #x4F ; ECX
Note the JMP DWORD PTR [EAX+5]
in place of a CALL
; self-calls
are clearly optimised into jumps. Next, let's take a look at what
happens when a second function calls foo
:
* (defun bar () (foo)) BAR * (disassemble 'bar) ; Compiling LAMBDA NIL: ; Compiling Top-Level Form: 48109720: .ENTRY "LAMBDA NIL"() ; (FUNCTION NIL *) 38: POP DWORD PTR [EBP-8] 3B: LEA ESP, [EBP-32] 3E: TEST ECX, ECX ; [:NON-LOCAL-ENTRY] 40: JNE L0 42: MOV EAX, [#x48109718] ; #<FDEFINITION object for FOO> ; No-arg-parsing entry point ; [:NON-LOCAL-ENTRY] 48: XOR ECX, ECX 4A: PUSH DWORD PTR [EBP-8] ;;; [3] (FOO) 4D: JMP DWORD PTR [EAX+5] ; [:CALL-SITE] 50: L0: BREAK 10 ; Error trap 52: BYTE #x02 53: BYTE #x1A ; INVALID-ARGUMENT-COUNT-ERROR 54: BYTE #x4F ; ECX
Again, there's a JMP
in place of the call; the presence of foo
in tail-position does not create an additional frame.
Finally, a marginally more rich example; here we compare two
functions in the body of baz
:
* (defun baz () (format t "Hello") (bar)) BAZ * (disassemble 'baz) ; Compiling LAMBDA NIL: ; Compiling Top-Level Form: 48162AF0: .ENTRY "LAMBDA NIL"() ; (FUNCTION NIL *) B08: POP DWORD PTR [EBP-8] B0B: LEA ESP, [EBP-32] B0E: TEST ECX, ECX ; [:NON-LOCAL-ENTRY] B10: JNE L0 B12: MOV EBX, ESP ; No-arg-parsing entry point ; [:NON-LOCAL-ENTRY] B14: SUB ESP, 12 B17: MOV EDX, 671088679 ; T B1C: MOV EDI, [#x48162AE0] ; "Hello" B22: MOV EAX, [#x48162AE4] ; #<FDEFINITION object for FORMAT> B28: MOV ECX, 8 B2D: MOV [EBX-4], EBP B30: MOV EBP, EBX ;;; [3] (FORMAT T "Hello") B32: CALL DWORD PTR [EAX+5] ; [:CALL-SITE] B35: MOV ESP, EBX ; [:SINGLE-VALUE-RETURN] B37: MOV EAX, [#x48162AE8] ; #<FDEFINITION object for BAR> B3D: XOR ECX, ECX B3F: PUSH DWORD PTR [EBP-8] ;;; [4] (BAR) B42: JMP DWORD PTR [EAX+5] ; [:CALL-SITE] B45: NOP B46: NOP B47: NOP B48: L0: BREAK 10 ; Error trap B4A: BYTE #x02 B4B: BYTE #x1A ; INVALID-ARGUMENT-COUNT-ERROR B4C: BYTE #x4F ; ECX
Notice how the format
is called with a CALL
instruction (CALL DWORD PTR [EAX+5]
) while bar
is once again JMP
'd to.
2.2 DONE SBCL
2.2.1 Documentation
SBCL's documentation (http://www.sbcl.org/manual/index.html#Debug-Tail-Recursion) should be familiar:
The compiler is "properly tail recursive." If a function call is in a tail-recursive position, the stack frame will be deallocated at the time of the call, rather than after the call returns.
2.2.2 Discussion
A few notes:
Nikodemus Siivola on sbcl-help: http://permalink.gmane.org/gmane.lisp.steel-bank.general/3272,
Tail-merging is done if (> SPACE DEBUG) or (> SPEED DEBUG) – not by default.
and http://permalink.gmane.org/gmane.lisp.steel-bank.general/3279,
jsnell once gave me this recipe to ensure that SBCL will do TCO is:
(declaim (optimize #+sbcl (sb-c::merge-tail-calls 3) #+sbcl (sb-c::insert-debug-catch 0)))
Similarly http://c2.com/cgi/wiki?TailRecursion,
sbcl does this if you use the proper optimization declarations. I know for certain that a:
(declaim (optimize (debug 0) (safety 0) (speed 3)))
will enable the optimization of tail-calls into gotos globally.
2.2.3 Demonstration
(Using 1.0.51 on Darwin)
The following should come as no surprise given the discussion of
CMUCL above. Note that no optimisation has been requested using
declaim
in these examples.
Tail recursion:
* (defun foo () (foo)) FOO * (disassemble 'foo) ; disassembly for FOO ; 03D20574: 488B05A5FFFFFF MOV RAX, [RIP-91] ; #<FDEFINITION object for FOO> ; no-arg-parsing entry point ; 7B: 31C9 XOR ECX, ECX ; 7D: FF7508 PUSH QWORD PTR [RBP+8] ; 80: FF6009 JMP QWORD PTR [RAX+9] ; 83: 0F0B0A BREAK 10 ; error trap ; 86: 02 BYTE #X02 ; 87: 18 BYTE #X18 ; INVALID-ARG-COUNT-ERROR ; 88: 54 BYTE #X54 ; RCX NIL
Notice the JMP
instruction at 03D20580
.
Full TCO:
* (defun bar () (foo)) BAR * (disassemble 'bar) ; disassembly for BAR ; 03D9BC34: 488B05A5FFFFFF MOV RAX, [RIP-91] ; #<FDEFINITION object for FOO> ; no-arg-parsing entry point ; 3B: 31C9 XOR ECX, ECX ; 3D: FF7508 PUSH QWORD PTR [RBP+8] ; 40: FF6009 JMP QWORD PTR [RAX+9] ; 43: 0F0B0A BREAK 10 ; error trap ; 46: 02 BYTE #X02 ; 47: 18 BYTE #X18 ; INVALID-ARG-COUNT-ERROR ; 48: 54 BYTE #X54 ; RCX
Here the foo
is JMP
'd to, not a CALL
'd.
Finally, a fuller example of the above,
* (defun baz () (format t "Hello~%")) BAZ * (defun qux () (baz) (bar)) QUX * (disassemble 'qux) ; disassembly for QUX ; 03E355B4: 488D5424F0 LEA RDX, [RSP-16] ; no-arg-parsing entry point ; B9: 4883EC18 SUB RSP, 24 ; BD: 488B059CFFFFFF MOV RAX, [RIP-100] ; #<FDEFINITION object for BAZ> ; C4: 31C9 XOR ECX, ECX ; C6: 48892A MOV [RDX], RBP ; C9: 488BEA MOV RBP, RDX ; CC: FF5009 CALL QWORD PTR [RAX+9] ; CF: 480F42E3 CMOVB RSP, RBX ; D3: 488B058EFFFFFF MOV RAX, [RIP-114] ; #<FDEFINITION object for BAR> ; DA: 31C9 XOR ECX, ECX ; DC: FF7508 PUSH QWORD PTR [RBP+8] ; DF: FF6009 JMP QWORD PTR [RAX+9] ; E2: 0F0B0A BREAK 10 ; error trap ; E5: 02 BYTE #X02 ; E6: 18 BYTE #X18 ; INVALID-ARG-COUNT-ERROR ; E7: 54 BYTE #X54 ; RCX NIL
Execution of baz
is now performed with a CALL
, but bar
is
optimised into a JMP
as it sits in tail position.
2.3 TODO CLisp
- Tail self-call optimisation
- yes
- TCO
- seems not [1/2]
[X]
Find a clear mention (it's not in the docs).[ ]
Are there any available declarations that provide full TCO?- So far, it seems not.
JMPTAIL
is used for self-calls in tail position.
- So far, it seems not.
The best entry point I found is http://eli.thegreenplace.net/2007/06/26/sicp-section-121/,
Indeed, CLISP implements the tail call optimization. To see one difference between them, we can run the two factorial functions on some large input (say 4000). While factorial-rec will fail with a stack overflow, factorial-iter won’t – this is because the stack does not grow.
Note that this refers to self-call TCO, not the more general form.
2.3.1 Documentation
2.3.2 Discussion
Self-call elimination in tail position is performed by the simplify
function defined in compiler.lisp
:
http://clisp.hg.sourceforge.net/hgweb/clisp/clisp/file/6afcecc6bb2c/src/compiler.lisp#l8992
2.3.3 Demonstration
As of 2.49, we get a jump if we recurse from tail position
(actually, the special JMPTAIL
bytecode instruction),
[1]> (defun foo () (foo)) FOO [2]> (disassemble 'foo) Disassembly of function FOO 0 required arguments 0 optional arguments No rest parameter No keyword parameters 3 byte-code instructions: 0 (JMPTAIL 0 1 L4) 4 L4 4 (JMPTAIL 0 1 L4) NIL
but not when performing a tail call,
[1]> (defun bar () 5) BAR [2]> (defun baz () (bar)) BAZ [3]> (disassemble 'baz) Disassembly of function BAZ (CONST 0) = BAR 0 required arguments 0 optional arguments No rest parameter No keyword parameters 2 byte-code instructions: 0 (CALL0 0) ; BAR 2 (SKIP&RET 1) NIL
The lack of general TCO is evident when we write a 'trampoline' call:
[1]> (defun foo () (format t "Hi~%") (bar)) [2]> (defun bar () (format t "There~%") (foo)) [3]> (foo) Hi There [...] *** - Program stack overflow. RESET
The above definition of foo
disassembles as follows,
[4]> (disassemble 'foo) Disassembly of function FOO (CONST 0) = #<COMPILED-FUNCTION FOO-1> (CONST 1) = *STANDARD-OUTPUT* (CONST 2) = BAR 0 required arguments 0 optional arguments No rest parameter No keyword parameters reads special variable: *STANDARD-OUTPUT* 5 byte-code instructions: 0 (CONST&PUSH 0) ; #<COMPILED-FUNCTION FOO-1> 1 (GETVALUE&PUSH 1) ; *STANDARD-OUTPUT* 3 (CALLSR 1 21) ; FUNCALL 6 (CALL0 2) ; BAR 8 (SKIP&RET 1) NIL
The (CALL0 2)
instruction calls consts[2]
, bar
, with no
arguments before returning.
2.4 DONE Clozure Common Lisp
- TCO
- yes
2.4.1 Documentation
The CCL optimisation documentation, http://trac.clozure.com/ccl/wiki/DeclareOptimize, states:
For instance, a tail-call is optimized (the caller's stack frame is reused) if the POLICY.ALLOW-TAIL-RECURSION-ELIMINATION function in the current policy returns true.
This is a tad ambiguous. The name of this policy suggest that only self-calls are eliminated; this is not in fact the case.
2.4.2 Demonstration
Using Clozure Common Lisp Version 1.7 (DarwinX8664),
? (defun foo () (foo)) FOO ? (disassemble 'foo) ;;; (defun foo () (foo)) L0 [0] (leaq (@ (:^ L0) (% rip)) (% fn)) [7] (testl (% nargs) (% nargs)) [9] (jne L21) [11] (pushq (% rbp)) [12] (movq (% rsp) (% rbp)) ;;; (foo) L15 [15] (jmp L15) ;;; #<no source text> L21 [21] (uuo-error-wrong-number-of-args) NIL
So tail recursion elimination works as advertised: jmp L15
. How
about general tail call optimisation?
? (defun bar () (foo)) BAR ? (disassemble 'bar) ;;; (defun bar () (foo)) L0 [0] (leaq (@ (:^ L0) (% rip)) (% fn)) [7] (testl (% nargs) (% nargs)) [9] (jne L33) [11] (pushq (% rbp)) [12] (movq (% rsp) (% rbp)) ;;; (foo) [15] (xorl (% nargs) (% nargs)) [17] (movq (@ 'FOO (% fn)) (% temp0)) [24] (leaveq) [25] (jmpq (@ 10 (% temp0))) ;;; #<no source text> L33 [33] (uuo-error-wrong-number-of-args) NIL
Again, foo
is jumped to: (jmpq (@ 10 (% temp0)))
.
Fuller example: compare baz
and bar
in the following,
? (defun baz () (format t "")) BAZ ? (defun qux () (baz) (bar)) QUX ? (disassemble 'qux) ;;; (defun qux () (baz) (bar)) L0 [0] (leaq (@ (:^ L0) (% rip)) (% fn)) [7] (testl (% nargs) (% nargs)) [9] (jne L53) [11] (pushq (% rbp)) [12] (movq (% rsp) (% rbp)) ;;; (baz) [15] (xorl (% nargs) (% nargs)) [17] (movq (@ 'BAZ (% fn)) (% temp0)) [24] (nop) [26] (callq (@ 10 (% temp0))) [29] (leaq (@ (:^ L0) (% rip)) (% fn)) ;;; (bar) [36] (xorl (% nargs) (% nargs)) [38] (movq (@ 'BAR (% fn)) (% temp0)) [45] (leaveq) [46] (jmpq (@ 10 (% temp0))) ;;; #<no source text> L53 [53] (uuo-error-wrong-number-of-args) NIL
Here baz
is executed with callq
while bar
, being in tail
position, is jumped to.
2.5 DONE LispWorks
- TCO
- yes
2.5.1 Documentation
See http://www.lispworks.com/documentation/lw60/LW/html/lw-110.htm#pgfId-889221,
In 64-bit LispWorks and on x86 platforms the compiler optimizes tail calls unless
- The compiler optimize quality debug is 3, or
- There is something with dynamic scope on the stack, such as a special binding, a catch or dynamic-extent allocation (so it is not really a tail call)
On all other platforms the compiler optimizes tail calls unless 1.) or 2.) above apply, or
- The call has more than 4 arguments and this is more than the number of fixed (not &optional / &rest / &key ) parameters in the calling function.
- The call has more than 4 arguments and the calling function has &rest / &key parameters.
Consider, http://www.lispworks.com/documentation/lcl50/aug/aug-51.html:
Unless directed otherwise, the Compiler optimizes self-recursive function calls, tail calls, and self-tail calls. In particular, self-tail calls are automatically compiled as loops. While these function calls are efficient, they can be difficult to trace because they do not appear on the stack.
and
When the Compiler compiles either a tail call or a self-tail call, it reuses the calling function's stack frame rather than creating a new stack frame. This optimization is called tail merging.
2.5.2 DONE Demonstration
Self-calls in tail position,
CL-USER 1 > (defun foo () (foo)) FOO CL-USER 2 > (disassemble 'foo) 200C8C1A: 0: 89E6 move esi, esp 2: 81CEFCFF0F00 or esi, FFFFC 8: 3966D0 cmp [esi-30], esp 11: 7311 jnb L1 13: 80FD00 cmpb ch, 0 16: 750C jne L1 18: 55 push ebp 19: 89E5 move ebp, esp 21: B500 moveb ch, 0 23: C9 leave 24: FF25F8867221 jmp [217286F8] ; FOO L1: 30: E8B5580500 call 2011E4F2 ; #<Function RUNTIME:BAD-ARGS-OR-STACK 2011E4F2> 35: 90 nop 36: 90 nop 37: 90 nop NIL
…and general TCO,
CL-USER 3 > (defun bar () (format t "") (foo)) BAR CL-USER 4 > (disassemble 'bar) 21ACF682: 0: 89E6 move esi, esp 2: 81CEFCFF0F00 or esi, FFFFC 8: 3966D0 cmp [esi-30], esp 11: 7323 jnb L1 13: 80FD00 cmpb ch, 0 16: 751E jne L1 18: 55 push ebp 19: 89E5 move ebp, esp 21: 6803400020 push 20004003 ; T 26: B502 moveb ch, 2 28: B88B540A20 move eax, 200A548B ; "" 33: FF158C950320 call [2003958C] ; LW-XP:INTERNAL-FORMAT 39: B500 moveb ch, 0 41: C9 leave 42: FF25F8867221 jmp [217286F8] ; FOO L1: 48: E83BEE64FE call 2011E4F2 ; #<Function RUNTIME:BAD-ARGS-OR-STACK 2011E4F2> 53: 90 nop NIL
2.6 DONE Allegro
- TCO
- yes
2.6.1 Documentation
Tail merging, http://www.franz.com/support/documentation/current/doc/compiling.htm#tail-merge-disc-2
The two switches tail-call-self-merge-switch and tail-call-non-self-merge-switch control tail merging.
If true, compiler will perform self-tail-merging (for functions which call themselves). This switch will affect code only when true at the beginning of the function being self-called (and no other time)….
Initially true when speed is greater than 0.
If true, compiler will perform non-self-tail-merging (for functions in the tail position different than the function being executed). This switch will affect code only when true at the point of the call.
Initially true if speed is greater than 1 and debug less than 2.
2.6.2 Demonstration
Allegro 8.2 64-bit, OSX.
By default alisp
does not produce the fastest possible code in the
interpreter,
CL-USER> (defun foo () (foo)) FOO CL-USER> (disassemble 'foo) ;; disassembly of #<Function (:ANONYMOUS-LAMBDA 1) @ #x2129335a> ;; formals: ;; constant vector: 0: FOO ;; code start: #x2129330c: 0: 55 pushl ebp 1: 8b ec movl ebp,esp 3: 83 ec 38 subl esp,$56 6: 89 75 fc movl [ebp-4],esi 9: 89 5d e4 movl [ebp-28],ebx 12: 39 63 1a cmpl [ebx+26],esp 15: 76 02 jbe 19 17: cd 65 int $101 ; SYS::TRAP-STACK-OVFL 19: e3 02 jcxz 23 21: cd 61 int $97 ; SYS::TRAP-ARGERR 23: 80 7f cb 00 cmpb [edi-53],$0 ; SYS::C_INTERRUPT-PENDING 27: 74 02 jz 31 29: cd 64 int $100 ; SYS::TRAP-SIGNAL-HIT 31: 8b 5e 12 movl ebx,[esi+18] ; FOO 34: b1 00 movb cl,$0 36: ff d7 call *edi 38: c9 leave 39: 8b 75 fc movl esi,[ebp-4] 42: c3 ret 43: 90 nop ; No value
While (compile 'foo)
will optimise the above, it does not have the
desired effect across the board:
CL-USER> (defun qux () (foo)) QUX CL-USER> (disassemble 'qux) ;; disassembly of #<Function (:ANONYMOUS-LAMBDA 3) @ #x207655fa> ;; formals: ;; constant vector: 0: FOO ;; code start: #x207655ac: 0: 55 pushl ebp 1: 8b ec movl ebp,esp 3: 83 ec 38 subl esp,$56 6: 89 75 fc movl [ebp-4],esi 9: 89 5d e4 movl [ebp-28],ebx 12: 39 63 1a cmpl [ebx+26],esp 15: 76 02 jbe 19 17: cd 65 int $101 ; SYS::TRAP-STACK-OVFL 19: e3 02 jcxz 23 21: cd 61 int $97 ; SYS::TRAP-ARGERR 23: 80 7f cb 00 cmpb [edi-53],$0 ; SYS::C_INTERRUPT-PENDING 27: 74 02 jz 31 29: cd 64 int $100 ; SYS::TRAP-SIGNAL-HIT 31: 8b 5e 12 movl ebx,[esi+18] ; FOO 34: b1 00 movb cl,$0 36: ff d7 call *edi 38: c9 leave 39: 8b 75 fc movl esi,[ebp-4] 42: c3 ret 43: 90 nop ; No value CL-USER> (compile 'qux) QUX NIL NIL CL-USER> (disassemble 'qux) ;; disassembly of #<Function QUX> ;; formals: ;; constant vector: 0: FOO ;; code start: #x205d50bc: 0: 55 pushl ebp 1: 8b ec movl ebp,esp 3: 83 ec 38 subl esp,$56 6: 89 75 fc movl [ebp-4],esi 9: 89 5d e4 movl [ebp-28],ebx 12: 39 63 1a cmpl [ebx+26],esp 15: 76 02 jbe 19 17: cd 65 int $101 ; SYS::TRAP-STACK-OVFL 19: e3 02 jcxz 23 21: cd 61 int $97 ; SYS::TRAP-ARGERR 23: 80 7f cb 00 cmpb [edi-53],$0 ; SYS::C_INTERRUPT-PENDING 27: 74 02 jz 31 29: cd 64 int $100 ; SYS::TRAP-SIGNAL-HIT 31: 8b 5e 12 movl ebx,[esi+18] ; FOO 34: b1 00 movb cl,$0 36: ff d7 call *edi 38: c9 leave 39: 8b 75 fc movl esi,[ebp-4] 42: c3 ret 43: 90 nop ; No value CL-USER>
Note the call
to foo
even though that call sits in tail position.
This behaviour is a result of the default optimisation settings (as
indicated on startup),
Optimization settings: safety 1, space 1, speed 1, debug 2
In order to get the fastest possible code, the following form can be evaluated,
CL-USER> (proclaim '(optimize (speed 3) (safety 1) (space 0) (debug 0))) T
thereby yielding the expected results,
CL-USER> (defun bar () (bar)) BAR CL-USER> (disassemble 'bar) ;; disassembly of #<Function (:ANONYMOUS-LAMBDA 2) @ #x212f493a> ;; formals: ;; constant vector: 0: BAR ;; code start: #x212f4904: 0: e3 02 jcxz 4 2: cd 61 int $97 ; SYS::TRAP-ARGERR 4: 80 7f cb 00 cmpb [edi-53],$0 ; SYS::C_INTERRUPT-PENDING 8: 74 02 jz 12 10: cd 64 int $100 ; SYS::TRAP-SIGNAL-HIT 12: 8b 5e 12 movl ebx,[esi+18] ; BAR 15: b1 00 movb cl,$0 17: ff e7 jmp *edi 19: 90 nop ; No value CL-USER> (defun baz () (bar)) BAZ CL-USER> (disassemble 'baz) ;; disassembly of #<Function (:ANONYMOUS-LAMBDA 3) @ #x21318aca> ;; formals: ;; constant vector: 0: BAR ;; code start: #x21318a94: 0: e3 02 jcxz 4 2: cd 61 int $97 ; SYS::TRAP-ARGERR 4: 80 7f cb 00 cmpb [edi-53],$0 ; SYS::C_INTERRUPT-PENDING 8: 74 02 jz 12 10: cd 64 int $100 ; SYS::TRAP-SIGNAL-HIT 12: 8b 5e 12 movl ebx,[esi+18] ; BAR 15: b1 00 movb cl,$0 17: ff e7 jmp *edi 19: 90 nop ; No value CL-USER>
2.7 TODO GNU Common Lisp
- Still TODO [0/2]
[ ]
Pick apart the disassembly listings,[ ]
Clarify whether general TCO is ever performed i.e., not just self-calls.
2.7.1 Demonstration
Ubuntu 10.10, GCL (GNU Common Lisp) 2.6.7 CLtL1.
Note that this section is still rough
Self-calls in tail position are eliminated and replaced with JMP
,
>(defun foo () (foo)) FOO >(disassemble 'foo) Compiling /tmp/gazonk_608_0.lsp. End of Pass 1. ;; Note: Tail-recursive call of FOO was replaced by iteration. End of Pass 2. OPTIMIZE levels: Safety=0 (No runtime error checking), Space=0, Speed=3 Finished compiling /tmp/gazonk_608_0.lsp. #include "gazonk_608_0.h" void init_code(){do_init((void *)VV);} /* function definition for FOO */ static void L1() {register object *base=vs_base; register object *sup=base+VM1; VC1 vs_check; vs_top=sup; goto TTL; TTL:; goto TTL; } #( #((system::%init . #((system::mf (lisp::quote user::foo) 0)))) ) static void L1(); #define VC1 #define VM1 0 static void * VVi[1]={ #define Cdata VV[0] (void *)(L1) }; #define VV (VVi) /tmp/gazonk_608_0.o: file format elf32-i386 Disassembly of section .text: 00000000 <init_code>: init_code(): /tmp/gazonk_608_0.c:5166 0: 55 push %ebp 1: 89 e5 mov %esp,%ebp 3: 83 ec 18 sub $0x18,%esp 6: c7 04 24 00 00 00 00 movl $0x0,(%esp) d: e8 fc ff ff ff call e <init_code+0xe> 12: c9 leave 13: c3 ret 14: 8d b6 00 00 00 00 lea 0x0(%esi),%esi 1a: 8d bf 00 00 00 00 lea 0x0(%edi),%edi 00000020 <L1>: L1(): /tmp/gazonk_608_0.c:5170 20: 55 push %ebp 21: 89 e5 mov %esp,%ebp 23: 53 push %ebx 24: 83 ec 04 sub $0x4,%esp /tmp/gazonk_608_0.c:5172 27: a1 00 00 00 00 mov 0x0,%eax 2c: 3b 05 00 00 00 00 cmp 0x0,%eax /tmp/gazonk_608_0.c:5170 32: 8b 1d 00 00 00 00 mov 0x0,%ebx /tmp/gazonk_608_0.c:5172 38: 73 08 jae 42 <L1+0x22> /tmp/gazonk_608_0.c:5173 3a: 89 1d 00 00 00 00 mov %ebx,0x0 40: eb fe jmp 40 <L1+0x20> /tmp/gazonk_608_0.c:5172 42: e8 fc ff ff ff call 43 <L1+0x23> 47: eb f1 jmp 3a <L1+0x1a> T
Consider the following where format
is called before recursing,
>(defun baz () (format t "") (baz)) BAZ >(disassemble 'baz) Compiling /tmp/gazonk_654_0.lsp. End of Pass 1. ;; Note: Tail-recursive call of BAZ was replaced by iteration. End of Pass 2. OPTIMIZE levels: Safety=0 (No runtime error checking), Space=0, Speed=3 Finished compiling /tmp/gazonk_654_0.lsp. #include "gazonk_654_0.h" void init_code(){do_init((void *)VV);} /* function definition for BAZ */ static void L1() {register object *base=vs_base; register object *sup=base+VM1; VC1 vs_check; vs_top=sup; goto TTL; TTL:; base[0]= Ct; base[1]= ((object)VV[0]); vs_top=(vs_base=base+0)+2; Lformat(); vs_top=sup; goto TTL; } #( #("" (system::%init . #((system::mf (lisp::quote user::baz) 0)))) ) static void L1(); #define VC1 #define VM1 2 static void * VVi[2]={ #define Cdata VV[1] (void *)(L1) }; #define VV (VVi) /tmp/gazonk_654_0.o: file format elf32-i386 Disassembly of section .text: 00000000 <init_code>: init_code(): /tmp/gazonk_654_0.c:5166 0: 55 push %ebp 1: 89 e5 mov %esp,%ebp 3: 83 ec 18 sub $0x18,%esp 6: c7 04 24 00 00 00 00 movl $0x0,(%esp) d: e8 fc ff ff ff call e <init_code+0xe> 12: c9 leave 13: c3 ret 14: 8d b6 00 00 00 00 lea 0x0(%esi),%esi 1a: 8d bf 00 00 00 00 lea 0x0(%edi),%edi 00000020 <L1>: L1(): /tmp/gazonk_654_0.c:5170 20: 55 push %ebp 21: 89 e5 mov %esp,%ebp 23: 57 push %edi 24: 56 push %esi 25: 53 push %ebx 26: 83 ec 0c sub $0xc,%esp 29: 8b 1d 00 00 00 00 mov 0x0,%ebx /tmp/gazonk_654_0.c:5172 2f: a1 00 00 00 00 mov 0x0,%eax 34: 3b 05 00 00 00 00 cmp 0x0,%eax /tmp/gazonk_654_0.c:5171 3a: 8d 73 08 lea 0x8(%ebx),%esi /tmp/gazonk_654_0.c:5172 3d: 73 29 jae 68 <L1+0x48> /tmp/gazonk_654_0.c:5173 3f: 8d 7b 04 lea 0x4(%ebx),%edi 42: 8d b6 00 00 00 00 lea 0x0(%esi),%esi /tmp/gazonk_654_0.c:5177 48: a1 00 00 00 00 mov 0x0,%eax /tmp/gazonk_654_0.c:5176 4d: c7 03 00 00 00 00 movl $0x0,(%ebx) /tmp/gazonk_654_0.c:5178 53: 89 1d 00 00 00 00 mov %ebx,0x0 59: 89 35 00 00 00 00 mov %esi,0x0 /tmp/gazonk_654_0.c:5177 5f: 89 07 mov %eax,(%edi) /tmp/gazonk_654_0.c:5179 61: e8 fc ff ff ff call 62 <L1+0x42> 66: eb e0 jmp 48 <L1+0x28> /tmp/gazonk_654_0.c:5172 68: e8 fc ff ff ff call 69 <L1+0x49> 6d: 8d 76 00 lea 0x0(%esi),%esi 70: eb cd jmp 3f <L1+0x1f> T
GCL doesn't seem to provide generalised TCO though:
>(defun bar () (foo)) BAR >(disassemble 'bar) Compiling /tmp/gazonk_608_0.lsp. End of Pass 1. End of Pass 2. OPTIMIZE levels: Safety=0 (No runtime error checking), Space=0, Speed=3 Finished compiling /tmp/gazonk_608_0.lsp. #include "gazonk_608_0.h" void init_code(){do_init((void *)VV);} /* function definition for BAR */ static void L1() {register object *base=vs_base; register object *sup=base+VM1; VC1 vs_check; vs_top=sup; goto TTL; TTL:; vs_base=vs_top; (void) (*Lnk0)(); return; } static void LnkT0(){ ;} /* FOO */ #( #(user::foo (system::%init . #((system::mf (lisp::quote user::bar) 0)))) ) static void L1(); #define VC1 #define VM1 0 static void * VVi[2]={ #define Cdata VV[1] (void *)(L1) }; #define VV (VVi) static void LnkT0(); static void (*Lnk0)() = LnkT0; /tmp/gazonk_608_0.o: file format elf32-i386 Disassembly of section .text: 00000000 <init_code>: init_code(): /tmp/gazonk_608_0.c:5166 0: 55 push %ebp 1: 89 e5 mov %esp,%ebp 3: 83 ec 18 sub $0x18,%esp 6: c7 04 24 00 00 00 00 movl $0x0,(%esp) d: e8 fc ff ff ff call e <init_code+0xe> 12: c9 leave 13: c3 ret 14: 8d b6 00 00 00 00 lea 0x0(%esi),%esi 1a: 8d bf 00 00 00 00 lea 0x0(%edi),%edi 00000020 <L1>: L1(): /tmp/gazonk_608_0.c:5170 20: 55 push %ebp 21: 89 e5 mov %esp,%ebp 23: 53 push %ebx 24: 83 ec 04 sub $0x4,%esp /tmp/gazonk_608_0.c:5172 27: a1 00 00 00 00 mov 0x0,%eax 2c: 3b 05 00 00 00 00 cmp 0x0,%eax /tmp/gazonk_608_0.c:5170 32: 8b 1d 00 00 00 00 mov 0x0,%ebx /tmp/gazonk_608_0.c:5172 38: 73 1e jae 58 <L1+0x38> /tmp/gazonk_608_0.c:5177 3a: a1 08 00 00 00 mov 0x8,%eax /tmp/gazonk_608_0.c:5173 3f: 89 1d 00 00 00 00 mov %ebx,0x0 /tmp/gazonk_608_0.c:5176 45: 89 1d 00 00 00 00 mov %ebx,0x0 /tmp/gazonk_608_0.c:5179 4b: 83 c4 04 add $0x4,%esp 4e: 5b pop %ebx 4f: 5d pop %ebp /tmp/gazonk_608_0.c:5177 50: ff e0 jmp *%eax 52: 8d b6 00 00 00 00 lea 0x0(%esi),%esi /tmp/gazonk_608_0.c:5172 58: e8 fc ff ff ff call 59 <L1+0x39> 5d: eb db jmp 3a <L1+0x1a> 5f: 90 nop 00000060 <LnkT0>: LnkT0(): /tmp/gazonk_608_0.c:5180 60: 55 push %ebp 61: 89 e5 mov %esp,%ebp 63: 83 ec 18 sub $0x18,%esp 66: a1 00 00 00 00 mov 0x0,%eax 6b: c7 44 24 04 08 00 00 movl $0x8,0x4(%esp) 72: 00 73: 89 04 24 mov %eax,(%esp) 76: e8 fc ff ff ff call 77 <LnkT0+0x17> 7b: c9 leave 7c: c3 ret T
2.8 DONE Embeddable Common Lisp
- Tail self-call optimisation
- yes, when explicitly compiled
- TCO
- no
2.8.1 Documentation
Tail-call elimination in the recursion case is mentioned on http://ecls.sourceforge.net/new-manual/ch27s04.html.
2.8.2 Demonstration
Under ECL (Embeddable Common-Lisp) 10.3.1, Ubuntu 10.10, tail self-calls are not optimised into jumps by default:
> (defun foo () (foo)) FOO > (disassemble 'foo) Name: FOO Documentation: NIL Declarations: ((C-GLOBAL)) 0 NOMORE 1 CALLG 0,FOO 4 EXIT NIL
Compiling functions, however, does yield self-call elimination in tail position. Consider the following transcript,
ECL (Embeddable Common-Lisp) 10.3.1 Copyright (C) 1984 Taiichi Yuasa and Masami Hagiya Copyright (C) 1993 Giuseppe Attardi Copyright (C) 2000 Juan J. Garcia-Ripoll ECL is free software, and you are welcome to redistribute it under certain conditions; see file 'Copyright' for details. Type :h for Help. Top level in: #<process SI:TOP-LEVEL 09858fc0>. > (defun foo () (bar)) FOO > (defun bar () (foo)) BAR > (foo) C-STACK overflow at size 4227072. Stack can probably be resized. Available restarts: 1. (CONTINUE) Extend stack size 2. (RESTART-TOPLEVEL) Go back to Top-Level REPL. Broken at BAR. In: #<process SI:TOP-LEVEL 09858fc0>. >> (compile 'foo) ;;; OPTIMIZE levels: Safety=2, Space=0, Speed=3, Debug=0 ;;; ;;; End of Pass 1. ;;; Note: ;;; Invoking external command: ;;; i686-linux-gnu-gcc "-I/usr/include/" -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -g -O2 -fPIC -D_THREAD_SAFE -Dlinux -O -w -c "/tmp/ECL001QMn7Ix.c" -o "/tmp/ECL001QMn7Ix.o" ;;; ;;; Note: ;;; Invoking external command: ;;; i686-linux-gnu-gcc -o "/tmp/ECL001QMn7Ix.fas" -L"/usr/lib/" "/tmp/ECL001QMn7Ix.o" -shared -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -lecl -lgmp -lpthread -ldl -lm ;;; FOO NIL NIL >> (compile 'bar) ;;; OPTIMIZE levels: Safety=2, Space=0, Speed=3, Debug=0 ;;; ;;; End of Pass 1. ;;; Note: ;;; Invoking external command: ;;; i686-linux-gnu-gcc "-I/usr/include/" -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -g -O2 -fPIC -D_THREAD_SAFE -Dlinux -O -w -c "/tmp/ECL002Bmf0vt.c" -o "/tmp/ECL002Bmf0vt.o" ;;; ;;; Note: ;;; Invoking external command: ;;; i686-linux-gnu-gcc -o "/tmp/ECL002Bmf0vt.fas" -L"/usr/lib/" "/tmp/ECL002Bmf0vt.o" -shared -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -lecl -lgmp -lpthread -ldl -lm ;;; BAR NIL NIL >> (foo) Debugger received error: C-STACK overflow at size 4227072. Stack can probably be resized. Error flushed. >> (defun baz () (baz)) BAZ >> (baz) ;;; ;;; Stack overflow. ;;; Jumping to the outermost toplevel prompt ;;; Broken at BAR. In: #<process SI:TOP-LEVEL 09858fc0>. >> (compile 'baz) ;;; OPTIMIZE levels: Safety=2, Space=0, Speed=3, Debug=0 ;;; ;;; End of Pass 1. ;;; Note: ;;; Invoking external command: ;;; i686-linux-gnu-gcc "-I/usr/include/" -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -g -O2 -fPIC -D_THREAD_SAFE -Dlinux -O -w -c "/tmp/ECL0035PQrQP.c" -o "/tmp/ECL0035PQrQP.o" ;;; ;;; Note: ;;; Invoking external command: ;;; i686-linux-gnu-gcc -o "/tmp/ECL0035PQrQP.fas" -L"/usr/lib/" "/tmp/ECL0035PQrQP.o" -shared -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -lecl -lgmp -lpthread -ldl -lm ;;; BAZ NIL NIL >> (baz) [...runs indefinitely...]
Once compiled, baz
doesn't terminate. Note that both before and
after compilation, trampoline code blows the stack.
2.9 DONE Armed Bear Common Lisp
- TCO
- no
- Tail Self-Call Optimisation
- no
2.9.1 Discussion
- http://blog.sol42.com/yacb/20100805-armed-bear-common-lisp
This is a well known JVM limitation (you have to resort to loops and trampolines). The Common Lisp standard does not require TCO, but every other Lisp seems to do it, and I, coming from ML and Erlang, am used to programming in a very recursive style. But this is just me, orthodox lispers will tell you that you have to use loops.
2.9.2 Documentation
From the Armed Bear trac, http://trac.common-lisp.net/armedbear/wiki/SystemExecution,
As described elsewhere, the LispObject is the base component of which the Lisp world in ABCL has been built. Everything is a LispObject. LispObject has 10 methods which might get called when evaluating that "Lisp object": they're all called execute and implement the zero - nine argument forms and the array argument form (which allows for non-predetermined numbers of arguments).
When an object is evaluated one of the appropriate forms of the execute method is called. Code deeply nests calls to execute() methods, because the evaluation of a function inherently calls execute() methods to call other functions.
This system leads to large stack sizes: Java doesn't allow tail call elimination.