GSoC: LALR Parsing: A revised schedule

I'm far behind on my project, unfortunately, and, upon dukeleto's urging, I've written up a new timeline that I think I will be able to finish before Summer of Code ends.

First, I'll summarize what I have done:

  • An object representation for grammars
  • An object representation for deterministic push-down automata(DPDAs)
  • Generation of DPDAs for LR(0) grammars
  • The beginnings of an interpreter for DPDAs

And what I haven't done:

  • Actually producing a parse tree while parsing
  • Finishing the DPDA interpreter
  • Generation of parsing tables for any of simple LR(K), look-ahead LR(k) grammars, or general LR(k) grammars
  • Code generation from a DPDA
  • A pretty DSL for grammar specification like that in Yacc

I've listed these unfinished or unstarted tasks in ordering of descending priority. Without the ability to fully interpret DPDAs and produce a parse tree, there's little point in generating DPDAs for more grammars since we won't have a way to actually use the DPDAs. After that, expanding the set of grammars we can handle is more important than code generation because we can at least interpret the DPDAs to parse things. Finally, a pretty DSL is very low-priority, not because it isn't useful, but because it is either trivial (if you only allow representation of the grammar, and require separate semantic actions) or quite complicated (if you allow semantic actions embedded in the grammar specification that are written in some programming language).

With that in mind, here are two revised schedules: one optimistic, one pessimistic:

The optimisitic schedule:

  • August 9th-August 10th: Work out the details of how to handle building a parse tree while parsing (there is a chapter on this in "Practical Translators for LR(k) Languages" that I haven't fully read and which should be helpful).
  • August 10th-August 11th: Implement building a parse tree.
  • August 11th-August 12th: (This should overlap with the previous item.) Finish implementation of the DPDA interpreter. At this point, we should be able to actually parse LR(0) grammars, so write some tests for this.
  • August 13 - August 14: Implement follow set and simple-k-look-ahead set generation and implement interpretation of look-ahead transitions. This should give us the capability of parsing SLR(k) grammars.
  • August 15-August 16: Write more tests for LR(0) and SLR(k) grammar parsing, as well as more documentation in general.
  • August 17-22: Implement LALR(k) and, if time permits, GLR(k) DPDA generation. I need to reread some papers to be able to itemize the exact tasks involved here.

The pessimistic schedule:

  • August 9th-August 10th: Work out the details of how to handle building a parse tree while parsing (there is a chapter on this in "Practical Translators for LR(k) Languages" that I haven't fully read and which should be helpful).
  • August 11th-August 14th: Implement building a parse tree.
  • August 14th-August 16th: (This should overlap with the previous item.) Finish implementation of the DPDA interpreter. At this point, we should be able to actually parse LR(0) grammars, so write some tests for this.
  • August 17th-August 19th: Implement follow set and simple-k-look-ahead set generation and implement interpretation of look-ahead transitions. This should give us the capability of parsing SLR(k) grammars.
  • August 20th-August 22nd: More tests and documentation.

I believe that the pessimistic schedule is likely to be further from what I can achieve than the optimistic one, but it is the absolute minimum of what I will do.