How do threads fit into parrot?

(This post is to show my thinking on my GSoC project. Sorry in advance if it's not especially coherent or interesting out of context.)

Over the past week I've been examining the Parrot code to figure out how hybrid threads are going to fit in. First, if I'm going to make Parrot do stuff concurrently, I have to figure out what Parrot is actually doing at all.

To answer that question, here's a GDB stack trace of Parrot running some arbitrary code (a "say" command):

#0  0x0012d422 in __kernel_vsyscall ()
#1  0x004e0eb3 in __write_nocancel () from /lib/tls/i686/cmov/libpthread.so.0
#2  0x0022290b in Parrot_io_write_unix (interp=0x8052040, 
    filehandle=0x80b3078, s=0x8080094) at src/io/unix.c:567
#3  0x002217d8 in Parrot_io_write_buffer (interp=0x8052040, 
    filehandle=0x80b3078, s=0x8080094) at src/io/buffer.c:611
#4  0x0021fa89 in Parrot_io_putps (interp=0x8052040, pmc=0x80b3078, 
    s=0x8080094) at src/io/api.c:634
#5  0x00172191 in Parrot_say_sc (cur_opcode=0x8100940, interp=0x8052040)
    at src/ops/io.ops:243
#6  0x002086ae in runops_slow_core (interp=0x8052040, 
    runcore_unused=0x80df1b8, pc=0x8100940) at src/runcore/cores.c:647
#7  0x002073df in runops_int (interp=0x8052040, offset=0)
    at src/runcore/main.c:237
#8  0x001ca8bf in runops (interp=0x8052040, offs=0) at src/call/ops.c:127
#9  0x001c4c75 in Parrot_pcc_invoke_from_sig_object (interp=0x8052040, 
    sub_obj=0x80b3158, call_object=0x80b3218) at src/call/pcc.c:359
#10 0x001c43fe in Parrot_pcc_invoke_sub_from_c_args (interp=0x8052040, 
    sub_obj=0x80b3158, sig=0x30e0be "P->") at src/call/pcc.c:87
#11 0x001ac4b2 in Parrot_runcode (interp=0x8052040, argc=1, argv=0xbffff4c8)
    at src/embed.c:808
#12 0x002e6cd9 in imcc_run_pbc (interp=0x8052040, obj_file=0, output_file=0x0, 
    argc=1, argv=0xbffff4c8) at compilers/imcc/main.c:480
#13 0x002e783f in imcc_run (interp=0x8052040, 
    sourcefile=0xbffff656 "/tmp/xx.pir", argc=1, argv=0xbffff4c8)
    at compilers/imcc/main.c:746
#14 0x0804900f in main (argc=1, argv=0xbffff4c8) at src/main.c:141

The first interesting thing here is Parrot_say_sc, which is the instruction that is currently being executed. This is the thing that Parrot does - it runs bytecode ops.

The next interesting thing is Parrot_pcc_invoke_sub_from_c_args. This apparently takes a properly set up (...?) interpreter structure and actually uses it to execute some code. This looks like the entry point for a "thread".

So far, I'm visualizing the following:

  • Parrot wants to be executing concurrently schedulable "Tasks".
  • The first Task runs the sub marked :main.
  • Other Tasks are created by some existing task.
  • Threads are actually started by calling Parrot_pcc_invoke_sub_from_c_args on the appropriate sub.

This seems to be what the existing thread code does (or tries to do), except the thread for "main" is a special case.

So far, this sounds pretty straightforward to implement. Except that it isn't, because of a number of complications.

Complication #1: Pre-empting running tasks.

In my proposal, I wrote that after running for 10s a task would be pre-empted by the scheduler so that another task can be run. But how will this actually happen?

I could set an alarm and then switch tasks in a signal handler. This can pre-empt pretty much anything. On the other hand, switching tasks is either non-trivial if I try to do it myself or heavier than I'd like (i.e. megs of RAM for stacks) if I punt to swapcontext and friends. The other complication is that mixing threads and signals is somewhere between "not portable" and "not even defined behavior".

The other obvious option is to have the runcore check for an expired quantum after each op and handle the task switch when nessisary. This doesn't rely on signals and guarantees that tasks are swapped at a convienient time, but risks tasks not being pre-empted if they call some long-running op (e.g. calling a dlfunc). Also, it adds at least one branch to the middle of the runcore hot loop.

Optimally, I'll discover that Posix (and win32, and every other platform) has a set_timer_for_current_thread call that I haven't found in hours of searching and that Parrot is already built to freeze itself in the middle of anything and magically handles even interrupted system calls. But if anyone wants to bet, my money's on the side of neither of those things being true.

Complication #2: Garbage Collection

From what I've heard, Parrot has a stop-the-world mark-and-sweep garbage collector. That means that in order for any Task to do garbage collection, everyone has to first stop what they're doing and then wait while the garbage collector runs.

Optimally, I would want the garbage collector to work some other way, but that's well out of scope for my GSoC project. So the complication is just how to make everyone sync up.

My first idea on how to do this is just to make it part of rescheduling. When we reschedule, we check to see if anyone wants a GC run. If so, we sync up and wait for that to happen.

If we're pre-empting with signals then we can signal everyone to reschedule (and GC) right now. If not, the process may just have to sit there and wait for everyone else to reschedule at the end of the quantum.

Complication #3: I still don't really know what Parrot is doing.

Complication #3 was supposed to be "Shared Data" and talk about how some data needs locking, but really I can't say anything useful about that yet.

So... I need to go read the code some more.