In between packing (I'm moving across town) I'm doing a bit of work for Tapestry 5.1, TAP5-79: Improve Tapestry's property expression language to include OGNL-like features. People really miss being able to do a few cool things in OGNL, such as create lists and maps on the fly ... this is not uncommon when creating a page activation context.
The 5.0 code was based on regular expressions and hand parsing because it only supported a very limited number of options. For 5.1, the grammar will grow considerably, adding options for list and map creation, method invocation (with parameters), and perhaps property projection and list filtering. Hand-tooled parsers aren't going to keep up, so it was time to switch to a more complete solution.
I ended up choosing ANTLR because it seems well supported, has a book and good online documentation, and a set of supporting tools. ANTLR is used elsewhere as well, for example by Hibernate to parse HQL.
There is even decent support for ANTLR with Maven (while Tapestry still builds with Maven, something I hope to address soon). Because of this, I only check my grammar files into SVN, not the generated files; on the continuous integration server, the ANTLR plugin generates the lexer and parser code fresh for each build.
The only real down-side is the runtime dependency ... about 113K and problematic if Tapestry is ever combined with some other tool that has a dependency on a different and incompatible version. Hibernate (for better or worse) uses the ANTLR2 runtime library, which uses different package names.
My first step was to re-create Tapestry 5.0's behavior on top of ANTLR. Because of some complexity in the lexical part of the grammar (that darn ".." operator!) it took quite a bit of head bashing. I did eventually figure it out, and did what any self-respecting coder should do ... leave a simple, useful, documented example for the next poor slob.
Now I'm back into the side of code generation; Tapestry's property expression grammar is converted directly into bytecode; the intermediate language is Javassist, which is a significant subset of Java. So I parse the property expressions into a AST (abstract syntax tree), then generate what looks like Java code from that, which gets compiled in-process and turned directly into instantiable classes.
How would you test something like that? At one time, I would try to unit test that the generated code was correct. Eventually I hit some bugs where my tests passed, but the generated code was incorrect.
With code generation, there is no such thing as a unit test, it's always an integration test. You can try and limit the scope, but there's too many moving parts for a unit test to useful or credible.
Instead, I test my parsing and code generation logic by testing the generated objects' behavior. So I feed in a large number of expressions and objects to have expressions evaluated upon, and check that the results I get by reading and setting property expressions is correct. If I get the right results, I know the generated code is good.