I want to talk about what I need to do over the next couple weeks, and in order to do that, I'm going to start with a description of the system I'm working on. This was originally a much longer post, but I accidentally overwrote my first draft with something else. This is probably good since the first one read more like a reference manual.
The Parrot Compiler Toolkit (PCT) is a library that does all the heavy lifting for the back end of a compiler. The goal is that you decide how to parse your language (roll your own parser, nqp-rx, or the coming soon LALRskate) and PCT will do the rest.
PCT contains three different interrelated libraries: the PCT core, Parrot Abstract Syntax Tree (PAST), and Parrot Opcode Syntax Tree (POST). PCT defines some basic utilities like a Compiler class, a basic Grammar with useful rules, and a root Node class. PAST is a pre-made Abstract Syntax Tree to represent your program. POST is a format far closer to Parrot bytecode.
The workflow and what is responsible for each part is as follows:
- Parse your language (You)
- Build a PAST tree (You)
- Convert to a POST tree (PAST::Compiler.to_post)
- Convert to PIR (POST::Compiler.to_pir)
- Convert to bytecode (IMCC)
The goal of my project is to change those last two steps to this:
- Convert to bytecode (POST::Compiler.to_pbc)
This has a variety of advantages including speed (since we don't waste time generating text just to parse it again) and removing IMCC from the loop (the PIR compiler which has a variety of issues).
This direct conversion to bytecode is aided by what bacek called newPOST. This is a rewrite of the POST layer to be more fully featured. It was driven by the desire to write a complete PIR compiler on top of POST. newPOST is even closer to PBC than the old POST is. It adds more intelligence to existing POST classes and adds new classes to handle more things automatically.
Unchanged: Op, Ops
New functionality:
- Label - Now stores the label position and if it was declared
- Sub - Probably the biggest change. It now:
- More PIR flag attributes and test methods (is_init, main, anon, etc)
- Maintains a symbol table
- Maintains a label lookup table
New classes:
- Call - Represents calls, tailcalls, return, and yield. This allows it to store more detailed information about the types of the parameters and return values.
- File - The top level of a POST tree, stores global constants and subs.
- Value - Represents some kind of typed value, has subclasses:
- Constant - Like a PIR constant
- Key - Builds a Key PMC containing it's children
- Register - So that it can be referenced in a symbol table
- String - A string constant with a given charset & encoding
- VanillaAllocator - Allocates registers to values in the symtable
I think from here on out I'm going to use the concepts of Test-Driven Development. Instead of just plotting out what PAST structure maps to what newPOST structure, I'm going to create a set of PAST trees (probably with the aid of languages like Squaak and PAST) and see if newPOST can compile them to PBC. Whenever I hit an issue, I'll go in and fix PAST::Compiler to ensure it generates a newPOST tree that works for both PIR and PBC generation.
Wish me luck!
RE: GSOC 7: What is newPOST?
No. Way. Are you saying that you can put an end to the madness that we call IMCC? :O
If newPOST could accurately represent line numbers, you would make a very happy man.