I always loved latex for generating documents. It’s such an elegant way for a developer to “develop” documents!

Then I wanted to produce posts from the latex source I had. This is text manipulation, so I picked ruby and started writing a parser. Then I stopped.

How many times did I write a parser? Probably half a dozen times, for different reasons, but nonetheless it always started with text and ended with nodes.

This time I looked around and found treetop, which lets you write grammars to parse languages into trees and then associate operations to nodes (like converting the node to html).

Of course I could have looked around for a latex-to-html converter or an existing treetop latex grammar, but I wouldn’t have learnt treetop itself.

I just committed the grammar, the ruby script that generates html from nodes and the ruby script that I used for developing it and that reads a .tex file and converts it to html.


The grammar is very limited, basically just what I need for my own paper, but, who knows, someone might enjoy playing with it.


  1. Just noticed that wordpress “pre” and “code” tags don’t really get to encode properly more than a set than &lt and &gt. Now I have fixed it by hand.

    I guess that I’ll update my treetop and ruby html builder to do the encoding for wordpress. But some errors might still appear. As usual ;)

