|
Last week, Scott Petersen from Adobe gave a talk at Mozilla on a
toolchain he’s been creating—soon to be open-sourced—that allows C
code to be targeted to the Tamarin virtual machine. Aside from
being a really interesting piece of technology, I thought its
implications for the web were pretty impressive.
Before reading this post, readers who aren’t familiar with
Tamarin may want to read Frank Hecker’s excellent
Adobe, Mozilla, and Tamarin post from 2006 for some
background on its goals and why it’s relevant to Mozilla and the
open-source community in general.
If I followed his presentation right, Petersen’s toolchain works
something like this:
- A special version of the GNU C Compiler—possibly llvm-gcc—compiles C code into instructions
for the Low Level Virtual Machine.
- The LLVM instructions are converted into opcodes for a custom
Virtual Machine that runs in ActionScript, a variant of ECMAScript and
sibling of JavaScript.
- The ActionScript is automatically compiled into Tamarin
bytecode by Adobe Flash, which may be further compiled into native
machine language by Tamarin’s Just-in-Time (JIT) compiler.
The toolchain includes lots of other details, such as a custom
POSIX system call API and a C multimedia library that provides
access to Flash. And there’s some things that Petersen had to add
to Tamarin, such as a native byte array that maps directly to RAM,
thereby allowing the VM’s “emulation” of memory to have only a
minor overhead over the real thing.
The end result is the ability to run a wide variety of existing
C code in Flash at acceptable speeds. Petersen demonstrated a
version of Quake running in a Flash app, as well as a C-based
Nintendo emulator running Zelda; both were eminently playable, and
included sound effects and music.
So, once Petersen’s modifications to Tamarin make their way into
the next version of Adobe Flash, we can expect to see older
commercial games running in the browser. Even more impressive,
though, is the
sheer volume of existing code that can be made to
run inside the browser: Petersen showed us the C-compiled versions
of Lua, Ruby, Perl, and Python all running on the web in secure
Flash sandboxes.
What this means for Python
The potential implications this has for Python are particularly
interesting to me. The ability to run Python on the web is
exciting, to say the least; also interesting is the fact that by
sandboxing CPython in a virtual machine, we solve a lot of the
security issues that currently face the language when it comes to
running untrusted code.
Petersen’s work also resonates with a few goals of another
project called PyPy. I’m going to try to explain the idea
behind PyPy in a later post; for the time being, the
slides from my April 2007 ChiPy presentation on PyPy
may serve as a passable introduction.
In a nutshell, the difference in mindset between PyPy and
Petersen’s work is that the former is radically innovative in scope
and mission, while the latter is pragmatic. PyPy’s goal is
essentially to move the canonical implementation of Python from C
to Python itself, and then use a pluggable toolchain to translate
the Python interpreter to any platform with a configurable set of
language and implementation features. In one fell swoop, this
modularizes the composition of the Python interpreter in such a way
that innovating and maintaining different ports and variants of
Python like IronPython, Jython, and Stackless no longer requires either
writing an entire copy of the same interpreter in a different
language or branching the CPython source code and making pervasive
changes to it.
Rather than focusing on innovation, Petersen’s work focuses on
code reuse. Instead of moving a canonical interpreter
implementation from C to a dynamic language, his strategy is to
simply compile the existing C code to run in a virtual machine
that’s implemented in a dynamic language. Both approaches aim to
obviate the necessity of “ports” of interpreters to different
platforms, and as such their purposes intersect at a common subset
of functionality. But Petersen’s work can’t be used to facilitate
the innovation of the Python language and its
implementation, while PyPy offers few or no tools to reuse existing
non-Python code. Perhaps it’s possible to combine the best of both
worlds by taking PyPy’s generated C interpreter and using
Petersen’s toolchain to allow it to be usable on the web and other
places that Tamarin runs.
What this means for the Open Web
To be honest, I’m not quite sure where the dividing line is
between what of Petersen’s work is Flash-specific and what can be
reused to benefit the Open Web. Since ActionScript is a sibling
language to JavaScript, it’s possible that the custom VM he created
can be run in a browser with relatively few modifications—albeit
much more slowly in Firefox at the time being, since
SpiderMonkey-Tamarin integration is not yet complete. Once that’s
further along, though, I imagine it should be possible to create C
“libraries” that can be used in the toolchain to allow sandboxed C
code to interact with web pages rather than Flash apps. Should this
be feasible, I think it will possibly be the ultimate in a
relatively recent string of
next-generation Javascript virtual
machines that allow existing code to run safely in
browsers.
Also, in the context of the web, download size is a significant
concern because applications are essentially streamed to clients.
While Petersen’s toolchain means that it’s possible to instantly
inherit most of CPython’s benefits on the web, it also means that
we get all of its flaws along with it—such as the fact that the
standard CPython distribution is a few megabytes large. But there’s
ways to get around this.
In any case, I’m really excited to see how both Petersen’s work
and PyPy proceed. I just hope I haven’t mis-represented either one
of them here due to a lack of understanding; I’ll try to correct
this blog post as I become aware of my mistakes
|