Skip to content
 

Compiling let

Following my previous Scheme toy interpreter in Lua, I have already tried twice to write a simple Scheme compiler. I failed mostly because the complexity grew too fast and I could not see any of it working before losing interest. But I then read Ghuloum’s paper on incremental compiler construction. It’s a very nice approach, and I stopped trying to think everything in advance and instead started to work little by little. Unlike the paper I’m not compiling to assembly but to a virtual machine, tough.

I have come across one of the many design decisions a compiler writer faces. When compiling let forms, they are first translated to the closed lambda applications they are syntatic sugar for. For instance:

(let ((a "Alex")
      (e (+ (* pi 2) 1))
      (i (car (cons 1 0))))
  ((if a + -) e i))

Becomes:

((lambda (a e i) ((if a + -) e i))
 "Alex" (+ (* pi 2) 1) (car (cons 1 0)))

There is no need to emit code to create full closures when compiling closed lambda applications, because they cannot be applied multiple times. Instead, the environment is extended during compilation and the body is compiled in this new environment, but still it is considered just a piece of the enclosing body. So far so good, but what does happen during execution?

During the execution of the virtual machine, the arguments to full closures are pushed onto the stack (along with a return address, saved frame pointer and number of arguments pushed), before jumping to the closure code. Currently, when entering the body of closed lambda applications, a lightweight closure calling protocol is used: Just the frame pointer and the arguments are pushed onto the stack and are latter popped. But when accessing bindings not local to the current scope, which is very common in let forms when one access the arguments of the enclosing full closure, the compiler has to emit code for jumping through activation frames until the desired one is found, and the binding is found there. In other words, the VM instruction for accessing bindings is LOAD i j, where i is the activation frame level and j is the index of the binding inside that activation frame.

Although this works, it is not as fast as it could be. I am planning to use flat environments as described in Dybvig’s PhD thesis so all access to bindings inside closures will have constant time, not linear on the depth of the desired binding. The current compilation of closed lambda applications defeats that optimisation, although it makes bindings somewhat shallower. The alternative is to look for all the bindings introduced by closed lambda applications during compilation and always allocate space for all of them in the stack. This uses (possibly much) more memory because allocates space for bindings that may never be needed during execution, but accessing any binding is again constant time because all of them are known at compile time.

As usual, it boils down to memory versus speed. I guess I’ll leave it as is for now and worry more about speed after the compiler is able of even compiling itself.

One Comment

  1. Bilbo says:

    Thanks for the references to these papers! I’ve been thinking of building a compiler for a game project I’m working on. I figured Scheme would be a good scripting language, but I need it specialized for the platform I’m developing on.

Leave a Reply