How to migrate from Wordpress to Phenomic

I recently decided to move my WordPress Blog to a static site generator. You can read about the reasons here. I'm using phenomic as my static site generator which takes markdown posts and converts them to HTML and embeds them in a web page that is fully customizable with React components. However, my old posts in WordPress are still written in HTML and use WordPress Plugins, and I had to migrate them from WordPress to phenomic. Here's how I did it.

#Convert WordPress posts to Markdown

#Export Posts from WordPress

You don't want to do this by hand and luckily you don't have to. The first step is to export your WordPress posts so they can be processed by other programs. You can do that by going to your Wordpress Dashboard and selecting Tools - Export - Posts. Download the xml file.

WordPress Export Posts

#Convert Posts to Markdown

The .xml file you just downloaded contains all your posts in HTML and there are several HTML to Markdown converters. The one I used (which was the only one that worked) is Exitwp. You 'll need Python 2 to run it, so if you haven't installed it, go ahead and do so. After that, you can follow the installation instructions for Exitwp (pip install --upgrade -r pip_requirements.txt). Then put the wordpress.xml file you downloaded into the wordpress-xml folder.
Running python exitwp.py should now convert all your posts to Markdown and create files in build/jekyll/*domain*/_posts.

#Adjusting the Parser

The python parser creates jekyll compliant Markdown posts which are similar to phenomic's post types. However, you might still want to adjust some settings to cater to your specific posts' HTML which might depend on the WordPress plugins you used. You can define custom RegExp to apply in exitwp.py - Mine looked like this:

body_replace: {
  #'<pre.*?lang:(\w*?)\W.*?>': '```\1\n',
  '<pre.*?>': '```javascript\n',
  '</pre>': '\n```\n',
  '<code.*?>': '`',
  '</code>': '`',
  '\[latex\]': '\(',
  '\[/latex\]': '\)',
  '\[caption[\s\S]*?\]': '',
  '\[/caption\]': '',
}

You can also adjust the frontMatter of your posts by reading through exitwp.py and editing the corresponding lines. For instance, I added phenomic's route parameter so no links are broken by phenomic, and also the disqus_identifier of the standard Disqus WordPress plugin so no comments will be lost.

yaml_header = {
    'title': i['title'],
    'route': '/' + i['link'][len('http://cmichel.io/'):],
    'author': i['author'],
    'date': datetime.strptime(
        i['date'], '%Y-%m-%d %H:%M:%S').replace(tzinfo=UTC()),
    'slug': i['slug'],
    'disqus_identifier': i['wp_id'] + ' http://cmichel.io/?p=' + i['wp_id'],
}

In the end, you should have automated most parts of the posts, so the only thing you need to fix manually is some bad formatting. Also remember that you can always have HTML tags in your Markdown which will then simply pass through your parser and end up unchanged in your final HTML file.

#Replace Image URLs

WordPress stores its images in wp-content/uploads/*year*/*month* and your <img src='...' /> elements in your converted posts will still point to this path. If you don't mind the chaos, you can of course leave the images there, but I wanted to link the images in a clean way to point to phenomic's assets folder. So log into your FTP account, download wp-content/uploads and put its contents into the content/assets folder of phenomic. To link the src attribute of the img elements to this new folder I found it easiest to just search all files for content/uploads and replace the string with assets. This can easily be done in a single click with a good text editor like Sublime Text, Atom, or my current favourite Visual Studio Code.

Run npm start now to check if the sites are parsed correctly by your static site generator.