Hybrid Threads

Threading systems let multiple code paths run at the same time. Why would anyone want that? Simple: impatience. It's no fun waiting for the computer to finish one thing when you want it to be doing something else.

So what are "hybrid" threads and why does Parrot need them? Well, there are two common schools of thought in building threading systems for high level language runtimes. The Java people call them "green" threads and "native" threads. As with any design tradeoff the right answer is to cheat and just take all the good properties of both options.

The difference between "green" threads and "native" threads is all about what things you don't want to wait on the computer for.

Some people don't want to wait for user input, or blocking operations in general. They want to be able to write network servers without using the select or poll system calls (at least not themselves). They want to have a thread for every little thing, and they want to be able to rely on threads using hardly any system resources. They don't really care about using multiple CPUs, since "the machine spends most of its time waiting on the user anyway". For these people, green threads - virtual software threads that are scheduled on one processor by their language runtime - are exactly what they need and native threads are "a bit slow". Without green threads they have to resort to event driven programming, which can be ugly in cases where threads would be elegant.

Other people don't want to wait for long running computations to finish. Either they want other things to happen at the same time, or they want their long running computations to run in parallel on multiple processors and actually finish faster. For these people, native threads - real operating system threads that will be scheduled on multiple processors - are great and green threads are completely useless. Without native threads these programmers have to resort to separate operating system processes, which do the job but can be ugly in cases where threads would be elegant.

Now there's a school of thought that says that threads shouldn't exist at all and we should do everything with processes and event driven programming. They could be right, but even then people would still want threads in Parrot.

So what do existing language runtimes do?

  • Ruby 1.8 uses green threads.
  • The Sun JVM and Microsoft .NET runtime both use native threads. The standard libraries make heavy use of event driven patterns.
  • Perl 5 uses interpreter threads, which are like native threads except they are especially slow to spawn so you can't use them like green threads at all.
  • CPython uses a wacky system of native threads with a global lock that prevents them from running in parallel. This has the advantage of simplicity, but otherwise seems like a pretty bad compromise.
  • Ruby 1.9 dropped green threads and adopted a design similar to CPython.

So how do we cheat and get all the benefits of both designs? We can have a hybrid system where we spawn a while bunch of lightweight green threads, but then have the interpreter schedule these threads on to a smaller number of native threads to take advantage of multiple processors. This is what Erlang and Haskell (GHC) do, and what I plan to get Parrot to do over the summer.

native code

This sounds fantastic.

One complication to keep in mind: when interfacing with multithreaded native code, it's useful to at least have the illusion of one VM thread to one native-code thread. For example, it's really handy that Java uses native threads because it's no big deal to call into multithreaded C++ code. It would be great if Parrot would invisibly spawn new native threads when some of the green threads were to call into native code.