End of GSoC, but not Time to Stop...

I've spent the last few days cleaning up my branch: adding documentation, checking that it passes the code standard tests, and trying compiling Rakudo nom. Don't get too excited, we can't compile nom directly to bytecode. Heck, it can't compile squaak directly yet. But I wanted to make sure that all my tinkering hasn't broken the original PIR generation path.

And, well, I had. But a few quick commits to fix stupid typos fixed the obvious errors. All that was left was the fact that it seemed to hang instead of compiling. A lot of digging turned up nothing. Turns out the answer was lack of patience. nom takes 25 minutes to compile on master and 40 on my branch. This is... Less Than Awesome. I expected some slowdown when converting hand-written PIR to NQP, but taking twice as long seems somewhat excessive.

NQP is a heavy-weight language. Variables are never stored in registers, needing to be looked up via find_lex before every use. Numbers and strings are stored as PMCs, getting boxed and unboxed on a regular basis. This causes far more overhead than I expected.

While valuable, I'm not sure the nqp_pct branch should really be merged to master. This kind of slowdown is unlikely to be welcomed with open arms.

I have some thoughts about what to do about it...

  1. NQP could shed some weight. Subs do not have to search for lexical variables when they're declared in the same block. It could use 6model to gain native int, num, and string variables. Perhaps other optimizations could be found. This would be valuable as not only would it speed up compiling Rakudo, but the results might be able to optimize the code Rakudo generates as well. The downside is that this is fairly difficult.
  2. Operations NQP uses frequently could be faster. A lot of things in Parrot are done by looking up strings in a table. When searching for $node or POST::Op 12 times in a single function, it would be useful to avoid having to do string comparisons every time. There was discussion of a find_lex_fast opcode that would return some reference to the storage location of the lexical instead of its current value
  3. PCT could be moved to Winxed. When I was writing the proposal, Winxed was just starting to make the rounds in #parrot, but it didn't quite seem ready to me. I found out recently that that feeling was right. Winxed still doesn't have the ability to do the class-based multi subs that both the PAST and POST compilers use left and right. But now that Winxed is included with Parrot and is gaining features rapidly, it's a viable candidate. Winxed is a nice low-level language and should generate code not terribly different than the original PIR. This also has the minor bonus that Winxed doesn't use PCT, so there would be less need to do bootstrapping.

One of the things that's struck me about PCT is the way it seems to have been grown instead of designed. The entire newPOST library hangs off the side of POST in a somewhat ugly fashion. Information that would be useful to the bytecode generator gets converted to PIR at a high level. The PAST to POST compiler spends a lot of time worrying about register allocation. This is not necessarily a bad thing. PCT is very useful for what it does.

But it seems difficult to add to. I've started percolating an idea in the back of my head. Right now I'm calling it PACT - Parrot's Alternate Compiler Toolkit. This idea may go nowhere. It may just feed into a set of refactors on PCT. But it keeps bouncing around the ol' brain.

I'll try and compile a post just on this idea later, but here's some bits of what I'm thinking about:

  • Compiler stages should be objects, not methods. Adding stages requires subclassing and attempting not to collide with the pile of methods already on PCT::HLLCompiler.
  • Compilers should be initialized and run as objects. Both PAST::Compiler and POST::Compiler use globals and the contextual variables (which I'm starting to view as akin to scoped globals) regularly. They are called more or less as static methods on the class instead of methods on an object. I think the state of the compiler could be far more clear by using object attributes.
  • Stages should be visitors. Having a standard visit(visitor, node) method could allow the various stages to chain and delegate between each other easier. Design patterns FTW.
  • Tree-optimization should be built in. It's very useful and should be used! Having a standard library of optimizations like constant folding and dead code eliminations could help everyone.
  • Symbol tables should be different from blocks.
  • Bytecode generation should be considered from the very beginning. Build from the VM up.
  • Debug information should be simple to add/extract. soh_cah_toa's thoughts on the debug segment are very valuable. It should be baked deeply into PACT.
  • We need a round-trip assembly format. PIR has a lot of magic in it. We also have no way to generate PIR from a PBC file. Being able to disassemble an object file can be VERY useful when attempting to debug a compiler.
  • PACT should support system-level languages. PCT currently does things like automatically return the last value from a sub, and turn every for loop and if statement into blocks and therefore subs. This is all fairly heavy-weight. Loops can be done with labels, variables can be stored in registers. Let's do it.
  • Make it possible to do things like Static Single Assignment form. Parrot being a register machine means we can leverage many years of optimizing knowledge. Let's put it to use!
  • Tests, tests, tests. PCT currently has little to no testing of it's actual functioning. We make sure that trees can be printed properly, that attributes can be get and set, but do no checking that a for loop works properly. This is poor.

That list ended up being a little bit longer than I had intended. But this has all been rattling around in my head for weeks now, and I think it's time to start pulling it out and playing with it. I didn't want to get into this during GSoC because I didn't want to distract from what I said I'd do.

Now, I don't want to completely reinvent the wheel. I want to continue with most of PCT's features, and make a transition between PCT and PACT fairly painless. In fact, I'd be thrilled if I could just drop large chunks of code from PCT into PACT. I also don't want this to just be my baby. I'd love to get feedback from anyone who's used PCT, thought about using PCT, or decided not to use PCT.

I think Parrot's strength is the easy entrance for people wanting to design a new programming language, which is something I think every programmer should do at least once. So I want to give us a first class compiler toolkit to enable users to walk up and get going. I want a toolkit that gives the ability to debug, profile, analyze, and optimize programs. I want Parrot to rock.