Migrating from WP-Syntax to PrismJS

WP-Syntax was a great syntax-highlighting plugin for WordPress. However, development had ceased and it was not updated for a very long time. While it is not broken per se, it didn’t work with Jetpack’s markdown support and so I stopped using it on the site and started using a different plugin. With the introduction of the Gutenberg editor, I started looking again for a plugin that will allow me to easily highlight fenced code blocks (this feature worked in the old editor with SyntaxHighlighter Evolved but isn’t supported in Gutenberg). Realizing that I don’t want three syntax-highlighting plugins enabled simultaneously, and not wanting to have an abandoned plugin enabled, I decided to migrate all the posts from WP-Syntax to a new solution.

The new solution I chose was PrismJS. I decided to use it directly (without a plugin) as it highlights by default all the <pre><code class="language-..."> constructs (which is what markdown produces as well) and I didn’t want (yet again) to use plugin specific shortcodes like before which will require migration when the plugin eventually stops working.

WP-Syntax used <pre lang="...">code goes here</pre> construct. Furthermore, it took care of html escaping everything inside the <pre> tag. So the migration solution would be to rewrite the <pre> tags to <pre><code> constructs, html escape the code inside the pre tag and finally remove any leading newlines. I wrote it to work on dumped SQL tables as it seemed easiest. The flow is

$ mysqldump --add-drop-table -u USER -p blog wp_comments > wp_posts.sql
$ python3 < wp_posts.sql > wp_posts_updated.sql
$ mysql --user=USER --password blog < /tmp/wp_posts_updated.sql
#!/usr/bin/python3

import re
import html
import sys

def convert(fin, fout):
    for line in fin:
        # Each post is in a single line
        # <pre><code> doesn't ignore the first newline like <pre>
        replaced = re.sub(r'<pre lang=\\"(.*?)\\">(?:\\r)?(?:\\n)?(.*?)</pre>', replace_and_escape, line)
        print(replaced)
        if line != replaced:
            print(line, replaced, sep="\n============>\n", file=sys.stderr)


def replace_and_escape(matchobj):
    language = matchobj.group(1)
    # We don't escape quotes because it's unnecessary and it would mess up the
    # SQL escaping
    content = html.escape(matchobj.group(2), quote=False)
    return r'<pre><code class=\"language-{}\">{}</code></pre>'.format(language, content)


if __name__=='__main__':
    convert(sys.stdin, sys.stdout)

Leave a Reply

Your email address will not be published. Required fields are marked *

 

This site uses Akismet to reduce spam. Learn how your comment data is processed.