I have found a very interesting thread in the Gambit-C Scheme mailing list. Marc Feeley tries to dismiss the myth that floating-point calculations are inherently faster in C/C++ than in Scheme, which is taken as common wisdom. The the work of Bradley Lucier is cited as a case of very fast numerical Scheme code.
Bradley himself replies, in which he explains that his code is roughly half as fast as it would have been if coded in C, but the memory bandwidth is the culprit and it does not matter anyway. An interesting part of the email is this:
I didn’t mind rewriting the level-1 BLAS code in Scheme, however, as it didn’t seem worth the trouble to get a small speedup in the entire system just to use a dot-product or saxpy written in C and called from an FFI. I have enough difficulty in getting students to admit to themselves that, yes, this system is fast (just about as fast as any expert could have written it, and probably much faster than your average graduate student could have written it) and it’s flexible (it’s only at the end of the course that some students reluctantly admit that they could not have finished their semester project in their favorite language, whether C or C++), and that the speed doesn’t arise from level-1 BLAS written in C (because they’re written in Scheme). One point that most students seem to take away from the class is that multigrid is one hell of a lot faster than conjugate gradient, and many of them have been using conjugate gradient in their own projects simply because it’s too hard to program multigrid in C or C++ (at least the first time you try it). So one gets a lot of speedup simply by being able to program more sophisticated algorithms.
There is also a toy ray-tracer written in Gambit-C, Schemeray, which was further optimised by Marc Feeley, and even more by (again) Bradley Lucier. The times are impressive, thanks to all the clever compiler tricks Gambit-C uses when generating C code, like inlining, partial code evaluation and generating one C function per Scheme module (avoiding expensive inter-module C calls).
The code generated by Gambit-C is fast enough for 99.99% of the applications out there. And this is using a strong, dynamically-typed language, with first-class closures and continuations, arbitrary-precision numbers, and the powerful macros of the Lisp family of languages. There is hardly need to use C or C++ even in the most demanding applications. The same cannot be said of Perl, Python or Ruby, whose applications usually run much slower than one created with Gambit-C. And, if the need really arises, interfacing Gambit-C with C code is extremely easy.