==========================[ Readings ]================================ Read Section 8.2 about the purpose of Exceptions in the textbook. Caveat: exceptions are not pure good---in fact, both Rust and Google's C++ style guide eschew them. However, they were a major invention of their time, and required intricate mechanisms to implement them. For this reason, it is important to understand an _even more general_ concept through which exceptions and most other control flow mechanisms can be implemented in a systematic and automatable manner. This concept is continuations. ===================[ The view from below: C's setjmp/longjump ]======= Take a look at setjmp(3) manual page on your Unix system. ("man setjmp" should do it). This page also describes the longjmp call, which is the counterpart of setjmp. Longjmp returns to the exact point where setjmp returned, with the return value passed in its second argument (setjmp always returns 0). Specifically, setjmp uses an opaque byte buffer struct "jmp_buf" to save the current context's CPU registers including the program counter (PC), the stack pointer (SP), and the frame pointer (FP---if used). Longjmp uses a pointer to this opaque struct to jump back to the saved execution point and restore the state of the CPU. Since this state includes SP and FP, the _position_ of the stack frame as pointed to by SP and FP is also restored. There is no guarantee that the contents of that memory are intact or sane---but at least the addresses of the local variables will be the same. There is one crucial intended difference. The initial setjmp returns 0. Executing lognjmp brings the PC/execution point right back to where setjmp returns---but with a different value. That value is the second argument of longjmp. This argument, though merely a wimpy integer, can communicate a lot of information if pre-gamed to do so, such as: an actual integer, an error/exception condition code, a symbol or object indexed by some pre-agreed table, and so on. These are primitives with which C programs can implement a version of exceptions (such as try/throw/raise/catch), and *any other flow control* including loops, breaking out of a loop, etc. This is a GOTO with saved and restored CPU and stack frame context. Jump back: ----------[ jmp1.c ]---------- // Example from https://web.eecs.utk.edu/~huangj/cs360/360/notes/Setjmp/lecture.html #include #include #include int main() { jmp_buf env; int i; i = setjmp(env); // initially returns 0, then longjmp's second argument printf("i = %d\n", i); if (i != 0) exit(0); longjmp(env, 2); // what if we pass 0? Or 1 or any other int? printf("Never gets printed\n"); } ------------------------------ % gcc -Wall -o jmp1 jmp1.c % ./jmp1 i = 0 i = 2 % The setjmp/longjmp is the ultimate control flow primitive. For example, we can implement a loop: ----------[ jmp2.c ]---------- #include #include #include int cnt; // global uninitialized variables will be 0s, allocated in the .bss section int main() { jmp_buf env; int i; i = setjmp(env); printf("i = %d\n", i); cnt++; if (i == 10) exit(0); longjmp(env, cnt); // cnt is now an integer, and we mean it printf("Never gets printed\n"); } % gcc -Wall -o jmp2 jmp2.c % ./jmp2 i = 0 i = 1 i = 2 i = 3 i = 4 i = 5 i = 6 i = 7 i = 8 i = 9 i = 10 % So far we operated within the same stack frame---but we can longjmp back from any frame, no matter how deep, just like an exception rescues us back to a known good state. This works so long as the original stack frame where the jmp_buf we are jumping back to was allocated is still "live" and intact. ----------[ jmp3.c ]---------- #include #include #include void foo(jmp_buf e, int x){ // we need e either as an argument, or as a global printf("x=%d &x=%p\n", x, &x); if( x == 10 ) longjmp(e, x); // this is the only way we can escape from infinite recursion foo(e, x+1); // never returns } int main() { jmp_buf env; int i, j=100; i = setjmp(env); printf("i = %d j = %d\n", i, j); if (i != 0) exit(0); foo(env, 0); printf("Never gets printed\n"); } % gcc -Wall -o jmp3 jmp3.c % ./jmp3 i = 0 j = 100 x=0 &x=0x16b76f754 x=1 &x=0x16b76f724 // 0x30 is the stack frame size. x=2 &x=0x16b76f6f4 x=3 &x=0x16b76f6c4 x=4 &x=0x16b76f694 x=5 &x=0x16b76f664 x=6 &x=0x16b76f634 x=7 &x=0x16b76f604 x=8 &x=0x16b76f5d4 x=9 &x=0x16b76f5a4 x=10 &x=0x16b76f574 i = 10 j = 100 % Note that if we change a local variable in our stack frame _after_ we save a setjmp(), its value will be as it was when last updated, not when we called setjmp(). This stands to reason if you look at setjmp()'s source or disassembly: it saved the CPU registers, not any kind of a memory snapshot. ----------[ jmp3a.c ]---------- #include #include #include void foo(jmp_buf e, int x){ printf("x=%d &x=%p\n", x, &x); if( x == 10 ) longjmp(e, x); foo(e, x+1); } int main() { jmp_buf env; int i, j=100; // printf("%d\n", sizeof(jmp_buf)); i = setjmp(env); printf("i = %d j = %d\n", i, j); if (i != 0) exit(0); j = 300; // <<---- change j after setjmp() saves SP, FP, and other registers foo(env, 0); printf("Never gets printed\n"); } % gcc -Wall -o jmp3 jmp3.c user@users-Air cs59 % ./jmp3 i = 0 j = 100 x=0 &x=0x16b6a7754 x=1 &x=0x16b6a7724 x=2 &x=0x16b6a76f4 x=3 &x=0x16b6a76c4 x=4 &x=0x16b6a7694 x=5 &x=0x16b6a7664 x=6 &x=0x16b6a7634 x=7 &x=0x16b6a7604 x=8 &x=0x16b6a75d4 x=9 &x=0x16b6a75a4 x=10 &x=0x16b6a7574 i = 10 j = 300 // <<--- note j's value We can see what exactly setjmp()/longjmp() do by looking at their disassembly: user@users-Air cs59 % lldb jmp3 (lldb) target create "jmp3" Current executable set to '/Users/user/cs59/jmp3' (arm64). (lldb) b main Breakpoint 1: where = jmp3`main, address = 0x0000000100003e44 (lldb) r Process 69041 launched: '/Users/user/cs59/jmp3' (arm64) Process 69041 stopped * thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1 frame #0: 0x0000000100003e44 jmp3`main jmp3`main: -> 0x100003e44 <+0>: sub sp, sp, #0xf0 ; =0xf0 0x100003e48 <+4>: stp x29, x30, [sp, #0xe0] 0x100003e4c <+8>: add x29, sp, #0xe0 ; =0xe0 0x100003e50 <+12>: adrp x8, 1 Target 0: (jmp3) stopped. (lldb) disas -n setjmp libsystem_platform.dylib`setjmp: 0x190890c74 <+0>: pacibsp 0x190890c78 <+4>: stp x21, x30, [x0] 0x190890c7c <+8>: mov x21, x0 0x190890c80 <+12>: orr w0, wzr, #0x1 0x190890c84 <+16>: mov x1, #0x0 0x190890c88 <+20>: add x2, x21, #0xb0 ; =0xb0 0x190890c8c <+24>: bl 0x1908955b4 ; symbol stub for: sigprocmask 0x190890c90 <+28>: mov x0, x21 0x190890c94 <+32>: ldp x21, x30, [x0] 0x190890c98 <+36>: autibsp 0x190890c9c <+40>: eor x16, x30, x30, lsl #1 0x190890ca0 <+44>: tbz x16, #0x3e, 0x190890b9c ; _setjmp 0x190890ca4 <+48>: brk #0xc471 (lldb) disas -n _setjmp libsystem_platform.dylib`_setjmp: 0x190890b9c <+0>: pacibsp 0x190890ba0 <+4>: mov x10, x29 0x190890ba4 <+8>: pacdb x10, sp 0x190890ba8 <+12>: mov x12, sp 0x190890bac <+16>: mov w9, #0xcbed 0x190890bb0 <+20>: pacdb x12, x9 0x190890bb4 <+24>: mrs x16, TPIDRRO_EL0 ; reads the Thread Local Storage special register 0x190890bb8 <+28>: and x16, x16, #0xfffffffffffffff8 0x190890bbc <+32>: ldr x16, [x16, #0x38] 0x190890bc0 <+36>: eor x10, x10, x16 0x190890bc4 <+40>: eor x11, x30, x16 0x190890bc8 <+44>: eor x12, x12, x16 0x190890bcc <+48>: stp x19, x20, [x0] 0x190890bd0 <+52>: stp x21, x22, [x0, #0x10] 0x190890bd4 <+56>: stp x23, x24, [x0, #0x20] 0x190890bd8 <+60>: stp x25, x26, [x0, #0x30] 0x190890bdc <+64>: stp x27, x28, [x0, #0x40] 0x190890be0 <+68>: stp x10, x11, [x0, #0x50] 0x190890be4 <+72>: str x12, [x0, #0x60] 0x190890be8 <+76>: stp d8, d9, [x0, #0x70] 0x190890bec <+80>: stp d10, d11, [x0, #0x80] 0x190890bf0 <+84>: stp d12, d13, [x0, #0x90] 0x190890bf4 <+88>: stp d14, d15, [x0, #0xa0] 0x190890bf8 <+92>: mov w0, #0x0 ; always returns 0 0x190890bfc <+96>: retab (lldb) You see a lot of registers being stored into offsets off of the value of x0, "stp xA, xB, [x0]", as expected, and also the floating point registers (d8-d15). "disas -n _longjmp" is the reverse of this, with matching ldp. But whatever on earth are pacibsp, autibsp, pacdb, retab instructions with all of the strange bit manipulations around them? They are CPU support for protecting pointers (i.e., memory addresses) from accidental corruption or malicious manipulation by using some of bits of the address to store and check its cryptographic hash. More at http://blog.ssg.aalto.fi/2019/06/protecting-against-run-time-attacks.html Warning: This CPU/OS feature (PAC) is a rabbit hole of security vs exploitation :) BTW, pasting an instruction into perplexity.ai and adding armv8 M1 for context tends to get reasonably good explanation of what the instruction does, although the details aren't always just right. Still, clearly not too much going on inside setjmp/longjmp. ========================[ CALL/CC in Scheme ]===================== Some things in C are easy, because we have explicit addresses of instructions and memory chunks, and can trivially use these addresses as labels. On the other hand, analyzing just where an address is pointing to and "closing over" that memory is hard (what is really there? What else is pointing to it? What are the scopes to consider? Lambdas in C are hard!). In Scheme, the situation is reversed: the lexical scopes are clear and explicit, making closures is easy and expected to happen all over the place. But how does one label a point in code where one S-expression's value feeds into evaluation of another and "jump" to it? CALL/CC returns evaluation to a prior point, with the prior evaluation state precisely restored, with just one difference: the new value at this point is what you provide, similar to setjmp/longjmp. Just like setjmp/longjmp, CALL/CC can implement just about any control flow invented to date---in a clean functional way! In Scheme there is no jumping to code, you only get to call functions. So CALL/CC _makes a function_ on the fly and hands it to you. By calling that special unnamed function from anywhere, you will return to the exact point and state of the original CALL/CC evaluation, with the return value you provide to that baked-to-order function, just like the second argument of longjmp. You need to catch this function somehow, which explains the somewhat awkward syntax of (call/cc (lambda (cc) ..)) The (lambda (cc) ...) will execute, with the baked-to-order unnamed function made available to you as "cc". What you do with it is up to you. If you simply do nothing with it in the body of your lambda, the value of call/cc will be whatever the lambda evaluates to, and that'll be it, call/cc will be nothing more than an extra "let" wrapper. However, if you save "cc" somehow, then you will be able to return back to the point of call/cc by calling it---from anywhere. So the awkward-looking (lambda (cc) (cc cc)) is actually the simplest thing you can do: your (call/cc (lambda (cc) (cc cc))) will return the baked-to-order function, right away, to whatever called (call/cc ..). Save this function, and you can come back to that spot from anywhere, with a different value. These are the basis of Matt Might's examples of a break and loop: ;;; mm-cont-break.scm ;; Adapted from http://matt.might.net/articles/programming-with-continuations--exceptions-backtracking-search-threads-generators-coroutines/ ;(define call/cc call-with-current-continuation) (let ((x 10)) (let ((ccc (call/cc (lambda (cc) (cc cc))))) ;; ccc will be bound to the continuation the first time, (cond ;; then to 'done ((procedure? ccc) ;; we'll run this test twice (one yes, one no) (let fooloop () ; endless loop (display x) ; (display "\n") ; (set! x (- x 1)) ; (if (< x 0) ; (ccc 'done)) ; break-out here, no other way out of the endless recursion (fooloop))) ; ((equal? ccc 'done) (display "That's all, folks!") #t) (else (error "Contract violation! %S" ccc))))) ;;; mm-cont-loop.scm ;; Adapted from http://matt.might.net/articles/programming-with-continuations--exceptions-backtracking-search-threads-generators-coroutines/ ;(define call/cc call-with-current-continuation) ;; Loop implemented via continuation (let* ((x 10) ;; Note let*. We tried a simple "let" in class. (ccc (call/cc (lambda (cc) (cc cc))))) ;; ccc will be bound to the continuation, multiple times, (cond ;; then to 'done ((procedure? ccc) ;; we'll run this test 11 times (10 yes, 1 no) (display x) (display "\n") (set! x (- x 1)) (if (> x 0) (ccc ccc) ;; pass back ccc if we want to loop, 'done to exit (ccc 'done))) ((equal? ccc 'done) (display "That's all, folks!") #t) (else (error "Contract violation! %S" ccc)))) In this example, the continuation handed by call/cc to the lambda function is saved in a global variable "frozen": ;; Adapted From OnLisp Ch. 20.1 ; (define call/cc call-with-current-continuation) ; (let whatll-happen-if-you-uncomment-this-let () (define frozen) ; this variable will play the role of a code label/jump target. It will ; be set inside call/cc (let ((sym 'a)) (display (append '(the call/cc returned) (list ;; we'll return here with 'a first, then restart from here with 'again, 'thrice, 'now (call/cc (lambda (cc) (set! frozen cc) sym)) ))) (display "\n")) (display "Hello\n") (frozen 'again) (frozen 'thrice) (foorbar (barbaz (frozen 'now))) ; observe how pending (foorbar (barbaz )) gets discarded. ; ;;) ; matches whatll-happen-.. let % scheme mm-cont-frozen.scm Chez Scheme Version 10.0.0 Copyright 1984-2024 Cisco Systems, Inc. (the call/cc returned a) Hello (the call/cc returned again) (the call/cc returned thrice) (the call/cc returned now) > Now for a really simple example: (display (+ 1 (call/cc (lambda (cc) (display "Checkpoint A.\n") ; (cc 200) ;; what happens if we uncomment this? (display "Checkpoint B.\n") 300))) ) Run it and see! =====================[ Chicken Scheme ]==================== I installed Chicken Scheme with "sudo port install chicken". This gets you the interpreter "csi", the compiler "csc", and a collection of shell scripting tools. The Chicken Scheme compiler first performs _massive_ amounts of Scheme code transformation to convert Scheme code to the Continuation Passing Style, then compiles everything to C functions that never return (like Charlie on the M.T.A.). See https://wiki.call-cc.org/chicken-compilation-process We observed the Chicken Scheme compiler at work: % csc mm-cont-loop.scm A native MacOS executable is produced: % file ./mm-cont-loop mm-cont-loop: Mach-O 64-bit arm64 executable, flags: % ./mm-cont-loop ... See progressive stages of code transformation: % csc --debug 1 mm-cont-loop.scm ... % csc -debug 1 mm-cont-loop.scm ... % csc -debug 2 mm-cont-loop.scm ... % csc -debug 3 mm-cont-loop.scm ... % csc -debug 4 mm-cont-loop.scm Capture and look at the C code. "lf[]" is the global symbol table, we looked for "done" and where it was referenced from. Once tabbed, the C code is not too bad to read: % csc -to-stdout mm-cont-loop.scm > mm-cont-loop.c We checked in class that none of the automagically produced CPS continuation _f_1xx functions have a "ret". They all end in "bl", which could've been a "b". % objdump -d mm-cont-loop | less ===== to be continued