Now that I have the basic green threads (Task) API basically working, it's time to start mixing in OS threads. I plan to do that in two steps: First I'm going to use OS threads to solve the key issue with the green threads implementation - blocking IO. The result of this will be basically equivilent to CPython threads-with-GIL. The second step - real parallel thread execution - will have to wait for after GSoC pencils down.
One major problem with the current implementation of green threads in my branch is blocking IO. Right now if you call a blocking read in any task the entire Parrot will block, leaving all the other tasks stuck blocked as well. This rather misses one of the key points of a threading system, so it wants to be fixed.
Conceptually, this is an easy problem to fix with threads. Whenever you want to perform a blocking IO operation, just spawn a thread. Now you have two threads - one thread can block on the IO and the other thread can keep doing work. When a thread finishes blocking for IO, keep it around so you don't have to spawn another one the next time IO happens.
In practice, there are complications. For example, every time the Parrot GC runs it scans the stack for references. With multiple threads, all of them need to be scanned. Not only does that mean I need to figure out how the GC does stack scanning, it also means that the start of the stack for every active thread needs to be kept somewhere. Yay!
In unrelated news, GSoC 2010 is almost over. The hard pencils down date is 16th, less than two weeks away. I'm going to try to get blocking IO working properly with multiple threads for then - thus giving Parrot similar thread functionality to CPython with possibly lighter weight threads. That won't be a terrible outcome if I can make it happen.
IO threads as external library
Perhaps it would help if you looked at the "dedicated IO-blocking threads" as an external library. In this case the thread calling the possibly blocking IO method would wait for a signal of some sorts and would hold any possible request structure describing the IO operation (so all parrot relevant information is visible to the garbage collector).
The IO thread pool would use the normal C heap (outside of the garbage collector) and would only have to call the actual library-call/syscall and upon completion send a signal.
Of course the biggest problem with this approach is that the function you only want to call the blocking function in the thread and that the calling of this function is deep in platform specific code. And conversion of this code would probably be very intrusive and a potential maintenance nightmare.