Converting Textile to Markdown... with sed

Posted July 2016

Some of the less beautiful points converting from one format to another, but without enough content to bother doing it the right way…

An ode to Markdown

It’s everywhere right? Yes - ever since Github and StackOverflow pushed it, it’s won the war. It’s the best, right? Well - it’s the easiest no doubt. reStructuredText still has a lot of fans for more technical / precisely structured contexts, but, well I find it too awkward to write in now.

The problem

Either way I wanted the new blog to be Markdown-driven, even though Pandoc supports Textile, and given that the extraction was from a MySQL dump file anyway… well, I couldn’t resists some low-down dirty sed hacking.

The hack agile solution

Split to multiple sed scripts, for “readability”:

$ find -type f -print0 | \
  xargs -0 \
  sed -i -r -e 's/\\n//g; s#"([^"]+)":([a-zA-Z0-9.:/%#\-]+)(\W)#[\1](\2)\3#g' \
            -e 's/@([^@]+)@/`\1`/g' \
            -e 's/!([^(]+)\(([^)]+)\)?!:([a-zA-Z0-9:/.%?@-]+)/![![\2](\1)](\3)/g' \
            -e 's/h3. /### /g; s/h4. /#### /g; s/^(p|bc).//g' 

wat?

In a bit more detail then… this finds all candidate files, and for each:

  • Remove literal \ns
  • Convert links in Textile (e.g. "foo":www.foo.com) to Markdown’s [foo](www.foo.com)
  • Convert inline syntax markup e.g. from @Integer.compare@ to `Integer.compare`
  • Also try to convert image links e.g. !/media/images/ql-screenshot-album-small.jpg(QL screenshot)!:http://code.google.com/p/quodlibet/wiki/Screenshots, but this was troublesome at best
  • Various other utilities like h3. -> ###, strip block declarations (need fixing by hand, or some much cleverer sed scripts)

But no, don’t try this at home

Now obviously this is the wrong way to do this and very far from infallible (note the simplified URL matching) but it was… good enough for these purposes.