Hybrid Threads: GSoC Project Results

I proposed a pretty ambitious Google Summer of Code project this year. Although I didn't manage to do everything I hoped, I did manage to get a useful subset of threading functionality working in the gsoc_threads branch. In this blog post I will describe what I have working and what more needs to be done.

What I have working now is a pre-emptively scheduled green threads system for Parrot that allows programs to be written in a concurrent style. Individual green threads can do basic blocking file input without stopping other threads from running. These logical threads are accessed using the Task API that I described a couple weeks ago.

This functionality makes Parrot similarly powerful at threading as the standard version of Ruby or Python: Threads can do pretty much everything except run at the same time.

Merging Logical Threads

Although it's mostly done, the gsoc_threads branch needs a bit more polishing before I would consider trying to get it merged. Specifically, I want to get the following done:

  • Support for blocking operations other than file input (what?)
  • Proper exception behavior. Exceptions should probably propagate through Task#join.
  • The ability to cleanly interrupt tasks. Task#kill should probably cause an exception to be ("immediately") thrown in the target task.
  • Some minor cleanup, like taking advantage of thread local storage to let the worker threads remember their index into the thread table and trying to make locking order clearer.
  • Make sure things at least don't blow up if you build without threads or without POSIX.
  • Get more comments on the basic Task API to make sure that the
    user facing part of this story makes sense.

Going the rest of the way

In the longer term, Parrot still needs to be able to run in parallel on multiple physical CPUs. That's the feature that everyone really wants.

The basic approach I've been woking towards for that is to split Parrot Interpreter structure into two parts: One Parrot global part, and one part that's local to each concurrently executing native thread. Users will expect the semantics of shared global variables (including things like classes), and that requires at least some globally shared interpreter state.

I've got a wiki page up with my ideas for how to acutally do the interpreter split.

Actually doing this will require some pretty significant refactoring in order to be able to make clear the proper data access dicipline for each shared item. Especially for things like classes, locking the global table on every access isn't good enough, tricks like concurrent hash tables with guaranteed atomic updates or stop-the-world-to-write will be required.

Hash tables

Good work!

Synchronized hash tables and the like should be the domain of the language implementations, not Parrot itself, in my opinion. Instead, Parrot should perhaps offer a biased lock primitive like Java. That primitive can be a no-op for threads sharing a single interpreter.

I've been reading about the new Rust programming language which does not permit sharing mutables between threads. I would like Parrot to have the ability to host a language like that efficiently by not imposing unnecessary locking behavior.