Skip to content

Adding a garbage collector

I guess it is clear by now that I am writing a toy Scheme compiler and virtual machine. Primarily for learning the techniques, I just touch it from time to time. The last addition to the virtual machine was a Cheney-style copy garbage collector. I wanted to implement a very simple algorithm to get a functioning system quicker. The simplest algorithms I know of are the Cheney one and the Deutsch-Schorr-Waite mark-and-sweep garbage collector. I chose the copying one because I believe generational garbage collection is the way to go for Scheme virtual machines, given that the rate of allocation is very high (allocation in a copy garbage collector is just moving a pointer) and most data objects die young (how many times did you use reverse after a loop?).

I was surprised by how easy it was to implement the collector. It is a very simple one (stop-the-world, two semi-spaces etc.) but nevertheless I always thought garbage collectors were somewhat magical, a bunch of elves that worked out of sight to keep your process working perfectly. This is exactly why I am writing such a system, even if it never gets free onto the world it will fulfill its purpose. Of course, if I am satisfied with the result, I will release the beast.

My voter registration

The voter registration period for the Scheme Steering Comittee is over. I thought I could then post my statement of interest:

I have been using Scheme for almost four years now. I use it mostly for studying programming languages and programming in general, prototyping and testing new ideas, but I have also developed and sold applications written in Scheme to customers. I believe Scheme is nowadays a maximum in the design space of programming languages and is the language of choice of some of the best hackers of the world. I thus deeply wish that Scheme gets getter with each new standard revision.

Compiling let

Following my previous Scheme toy interpreter in Lua, I have already tried twice to write a simple Scheme compiler. I failed mostly because the complexity grew too fast and I could not see any of it working before losing interest. But I then read Ghuloum’s paper on incremental compiler construction. It’s a very nice approach, and I stopped trying to think everything in advance and instead started to work little by little. Unlike the paper I’m not compiling to assembly but to a virtual machine, tough.

I have come across one of the many design decisions a compiler writer faces. When compiling let forms, they are first translated to the closed lambda applications they are syntatic sugar for. For instance:

(let ((a "Alex")
      (e (+ (* pi 2) 1))
      (i (car (cons 1 0))))
  ((if a + -) e i))

Becomes:

((lambda (a e i) ((if a + -) e i))
 "Alex" (+ (* pi 2) 1) (car (cons 1 0)))

There is no need to emit code to create full closures when compiling closed lambda applications, because they cannot be applied multiple times. Instead, the environment is extended during compilation and the body is compiled in this new environment, but still it is considered just a piece of the enclosing body. So far so good, but what does happen during execution?

During the execution of the virtual machine, the arguments to full closures are pushed onto the stack (along with a return address, saved frame pointer and number of arguments pushed), before jumping to the closure code. Currently, when entering the body of closed lambda applications, a lightweight closure calling protocol is used: Just the frame pointer and the arguments are pushed onto the stack and are latter popped. But when accessing bindings not local to the current scope, which is very common in let forms when one access the arguments of the enclosing full closure, the compiler has to emit code for jumping through activation frames until the desired one is found, and the binding is found there. In other words, the VM instruction for accessing bindings is LOAD i j, where i is the activation frame level and j is the index of the binding inside that activation frame.

Although this works, it is not as fast as it could be. I am planning to use flat environments as described in Dybvig’s PhD thesis so all access to bindings inside closures will have constant time, not linear on the depth of the desired binding. The current compilation of closed lambda applications defeats that optimisation, although it makes bindings somewhat shallower. The alternative is to look for all the bindings introduced by closed lambda applications during compilation and always allocate space for all of them in the stack. This uses (possibly much) more memory because allocates space for bindings that may never be needed during execution, but accessing any binding is again constant time because all of them are known at compile time.

As usual, it boils down to memory versus speed. I guess I’ll leave it as is for now and worry more about speed after the compiler is able of even compiling itself.

Blog under siege

I know it is a pain for who is commenting, but I had to add the reCAPTCHA WordPress plug-in to this blog. The spam count was already in the 300s, and increasing fast. Trackback spam is still high.

Lua Programming Gems out!

Hot from the presses, the book is finally here. The table of contents, the front matter and the free chapter 2 are in the book homepage.

Will code for money

Thanks to the Financial World Crisis, my employer wants to relocate me to a place where it will spend less money paying me and with less infra-structure. As I do not intend to relocate, I therefore offer my good coding skills for money. Remember that Brazillian paying rates are lower than European and American ones! :-) The link to my resume is at the side.

Equivalent of and-let*?

By this point I believe it’s clear that I am a big fan of Scheme. Lately I have seen a sharp rise of interest in Ruby in some virtual places I attend (#lisp-br, twitter, IM with coworkers etc.). I am sure Ruby is a fine language, even in spite of several complaints I hear from some of its users. After all, it has symbols, metaprogramming, blocks (a sort of closures), lambdas and more! I’d really take the time to learn it, if it wasn’t for Matz himself:

Some may say Ruby is a bad rip-off of Lisp or Smalltalk, and I admit that. But it is nicer to ordinary people.

Now that is a demotivator. But Ruby must be evolving, and that quote may be outdated. So I have thought of a question to the Ruby-lovers, that may help me make up my mind to take the time to learn it. One of the best Scheme macros is, in my opinion, and-let*, from the SRFI-2. How would one write the following in Ruby, a procedure that I wrote a couple of nights ago?

;; Tries to get session from current request
(define (get-session request)
  (and-let* ((cookies (request-cookies request))
             (p (assoc "session_id" cookies))
             (sid-str (cdr p))
             (sid (string->number sid-str))
             ((integer? sid))
             ((exact? sid))
             (sp (assq sid *sessions*)))
    (cdr sp)))

In the and-let* macro, the clauses are evaluated and the variables are bound sequentially, creating new scopes for the next expressions. If any of the forms evaluates to #f (false in Scheme), the computation is aborted, no other expressions are evaluated, and the whole form evaluates to #f. How would I write this in concise Ruby?

Paul Graham’s strategy

I have disappeared for a while, but my excuse is that I have been coding furiously. I intend to write a lot more in the not-so-far-away future. In the meanwhile, a quick post.

From time to time I read another Paul Graham essay, usually when they are cited in some Reddit or Slashdot thread. He really writes well, and although I do not agree with everything he says, I must say I agree with most of it. “Beating the averages” was very influential on some important decisions I have made, and for a lot of other people as well.

Today I found an essay called “Six Principles for Making New Things“, in which he describes his modus operandi:

Here it is: I like to find (a) simple solutions (b) to overlooked problems (c) that actually need to be solved, and (d) deliver them as informally as possible, (e) starting with a very crude version 1, then (f) iterating rapidly.

I agree 100% with him here. There are a lot of overlooked problems out there, in search of solutions, that just seem “simple” enough. But they need to be solved and, when someone does it, everybody else wish they had thought of it. But they were there, all the time. Moreover, the initial solution does not need to be perfect, as long as it solves an immediate need well enough and the solution can be iterated rapidly, which is a very, very important point. When someone creates a market, there is soon afterwards a rush of new competitors. Coming up with a simple solution which does not evolve quickly is the recipe to lose to another faster, more dynamic, more focused team.

I already had something like Graham’s Six Principles in the back of my mind for a while. It was good to see them written by someone who has made millions of dollars selling (Lisp!) software.

Lisp in Latinoware 2008

There will be presentations about Lisp in the V Latin America Conference on Free Software, via call-with-hopeless-continuation.

Meroon for Gambit-C

Meroon is a Scheme object system by Christian Queinnec. It is portable, fast, CLOS-like and reflexive. The Meroon home page cites versions for several popular Scheme systems, including Gambit-C. But the links to the Gambit-C version are broken. Doing a bit of research, that actually involved some guessing, I found the Meroon Gambit-C port in Bradley Lucier’s home page. Instructions to build this Meroon port can be found in this thread in the Gambit-C mailing list.

I am not a heavy user of object orientation, but I can certainly see its value where it is worth. In the LiSP book, for instance, a beautiful interpreter and a compiler are written in OO-style, in chapter 3 and 10, respectively. Another example is writing a scene graph for a Computer Graphics application, it almost begs to be written object-oriented.