Parrot-GMP: Yo dawg, heard you like generating code...

Step 1 for my project involved reading GMP.h and generating an NCI definition file from that. I've tweaked a few settings in the unimaginatively named "gmph2ncidef.pl" script that does that and refactored some common parts out in to YAML configuration files. This generates a PIR source file that gives us access to the GMP library functions.

Step 2 is to write a Winxed (http://winxed.org/) convenience class that makes doing GMP stuff easier as well as have documentation. Winxed is a higher level language implemented on Parrot that hides much of the nastiness of the Parrot object system and has a much nicer syntax while still generating efficient PIR code. Whiteknight++ and NotFound++ (as well as other Parrot devs) are working on getting a snapshot of Winxed into the Parrot repository for devs to use (see the with_winxed branch at https://github.com/parrot/parrot/tree/with_winxed for more details).

Previously I had a small, hand-crafted Winxed class that would handle initialization, destruction, setting, getting, and adding. This was enough for a quick sanity check - I initialized a GMP Integer, set it to 30, added 2 to it, and then got the number back and printed it out. Bursting forth on my console was the glorious number 32 and not some silly stack trace.

So my method was sound, but the Winxed class itself was not filled out. I would need to write a Winxed function for every single GMP Integer function with a lot of the same boilerplate code. So again, I automated the process. See http://github.com/bubaflub/parrot-gmp/commit/71142e018bd for the script and http://github.com/bubaflub/parrot-gmp/commit/3642a51f7f5 for the generated Winxed class.

The GMP documentation comes in the Texinfo format (for the GNU Info program). I didn't find any CPAN modules that could natively read or handle Texinfo but Texinfo can be transformed into a number of formats, including HTML and XML. I chose to output to a single HTML file and then use Web::Scraper to parse the documentation into the information I need. The first iteration of my scraping code isn't pretty but it works.

The script as it stands now reads the GMP HTML documentation and outputs about 4,000 or so lines of Winxed code and documentation. This saves me an enormous amount of time - I don't have to write each one of those functions, the GMP documentation is excellent and precise, and my Winxed function names and parameters parallel the GMP ones so there are no surprises. Just as the parsing could use some improvement, the outputting of Winxed code could use some improvement. First, at this point I only want to output functions that match a certain pattern. Second, I want to skip a blacklist of functions - some are too complicated to automatically generate code for. Third, I need to handle pointers and double pointers better (or at all!). The problem is that "char* a", "char * a", and "char *a" are all valid and though the source is largely consistent it's not 100%.

In summary, I have a buggy and ugly script that generates Winxed code and documentation from the HTML GMP documentation. My next task is to remove the "buggy" adjective and then start porting an existing test suite.

P.S. All these hoops I'm jumping through to generate this code is only part of the initial development process; running or tweaking these scripts is not something an end-user or maybe even a developer will have to do, it's just to get things off the ground. The project will include all generated PIR so that there are no external dependencies beyond the core Parrot VM itself.