Migrating from WP-Syntax to PrismJS

WP-Syntax was a great syntax-highlighting plugin for WordPress. However, development had ceased, and it had not been updated for a very long time. While it is not broken per se, it didn’t work with Jetpack’s markdown support, so I stopped using it on the site and started using a different plugin. With the introduction of the Gutenberg editor, I started looking again for a plugin that would allow me to easily highlight fenced code blocks (this feature worked in the old editor with SyntaxHighlighter Evolved but isn’t supported in Gutenberg). Realizing that I don’t want three syntax-highlighting plugins enabled simultaneously, and not wanting to have an abandoned plugin enabled, I decided to migrate all the posts from WP-Syntax to a new solution.

The new solution I chose was PrismJS. I decided to use it directly (without a plugin), as it highlights by default all the <pre><code class="language-..."> constructs (which is what markdown produces as well), and I didn’t want (yet again) to use plugin-specific shortcodes like before, which would require migration when the plugin eventually stops working.

WP-Syntax used the <pre lang="...">code goes here</pre> construct. Furthermore, it took care of HTML-escaping everything inside the <pre> tag. So the migration solution would be to rewrite the <pre> tags to <pre><code> constructs, HTML-escape the code inside the pre tag, and finally remove any leading newlines. I wrote it to work on dumped SQL tables, as that seemed easiest. The flow is

$ mysqldump --add-drop-table -u USER -p blog wp_comments > wp_posts.sql
$ python3 < wp_posts.sql > wp_posts_updated.sql
$ mysql --user=USER --password blog < /tmp/wp_posts_updated.sql

#!/usr/bin/python3

import re
import html
import sys

def convert(fin, fout):
    for line in fin:
        # Each post is in a single line
        # <pre><code> doesn't ignore the first newline like <pre>
        replaced = re.sub(r'<pre lang="(.*?)">(?:r)?(?:n)?(.*?)</pre>', replace_and_escape, line)
        print(replaced)
        if line != replaced:
            print(line, replaced, sep="n============>n", file=sys.stderr)


def replace_and_escape(matchobj):
    language = matchobj.group(1)
    # We don't escape quotes because it's unnecessary and it would mess up the
    # SQL escaping
    content = html.escape(matchobj.group(2), quote=False)
    return r'<pre><code class="language-{}">{}</code></pre>'.format(language, content)


if __name__=='__main__':
    convert(sys.stdin, sys.stdout)

One thought on “Migrating from WP-Syntax to PrismJS”

Nice approach thanks. Saved me a lot of time.

Two possible improvements:

1.
Your mysqldump command dumps wp_comments. I think you mean wp_posts instead.

2.
wp-syntax also offered a “none” language. Those are

pre lang=”none”> blocks without syntax highlighting. So the script could have a fallback for those.

def replace_and_escape(matchobj):
language = matchobj.group(1)
if language == ‘none’:
language = ‘bash’;
…

Share this:

One thought on “Migrating from WP-Syntax to PrismJS”

Leave a Reply