Converting Textile to Markdown... with sed
Some of the less beautiful points converting from one format to another, but without enough content to bother doing it the right way…
An ode to Markdown
It’s everywhere right? Yes - ever since Github and StackOverflow pushed it, it’s won the war. It’s the best, right? Well - it’s the easiest no doubt. reStructuredText still has a lot of fans for more technical / precisely structured contexts, but, well I find it too awkward to write in now.
The problem
Either way I wanted the new blog to be Markdown-driven, even though Pandoc supports Textile, and given that the extraction was from a MySQL dump file anyway… well, I couldn’t resists some low-down dirty sed
hacking.
The hack agile solution
Split to multiple sed scripts, for “readability”:
find -type f -print0 | \
$ xargs -0 \
sed -i -r -e 's/\\n//g; s#"([^"]+)":([a-zA-Z0-9.:/%#\-]+)(\W)#[\1](\2)\3#g' \
-e 's/@([^@]+)@/`\1`/g' \
-e 's/!([^(]+)\(([^)]+)\)?!:([a-zA-Z0-9:/.%?@-]+)/![![\2](\1)](\3)/g' \
-e 's/h3. /### /g; s/h4. /#### /g; s/^(p|bc).//g'
wat?
In a bit more detail then… this finds all candidate files, and for each:
- Remove literal
\n
s - Convert links in Textile (e.g.
"foo":www.foo.com
) to Markdown’s[foo](www.foo.com)
- Convert inline syntax markup e.g. from
@Integer.compare@
to`Integer.compare`
- Also try to convert image links e.g.
!/media/images/ql-screenshot-album-small.jpg(QL screenshot)!:http://code.google.com/p/quodlibet/wiki/Screenshots
, but this was troublesome at best
- Various other utilities like
h3.
->###
, strip block declarations (need fixing by hand, or some much cleverer sed scripts)
But no, don’t try this at home
Now obviously this is the wrong way to do this and very far from infallible (note the simplified URL matching) but it was… good enough for these purposes.