unified

Learn/Recipe/HTML and remark

HTML and remark

remark is a markdown compiler. It’s concerned with HTML in two ways:

  1. markdown is often turned into HTML
  2. markdown sometimes has embedded HTML

When dealing with HTML and markdown, we will use both remark and rehype. This article shows some examples of how to do that.

Contents

How to turn markdown into HTML

remark handles markdown: it can parse and serialize it. But it’s not for HTML. That’s what rehype does, which exists to parse and serialize HTML.

To turn markdown into HTML, we need remark-parse, remark-rehype, and rehype-stringify:

var unified = require('unified')
var remarkParse = require('remark-parse')
var remarkRehype = require('remark-rehype')
var rehypeStringify = require('rehype-stringify')

unified()
  .use(remarkParse) // Parse markdown content to a syntax tree
  .use(remarkRehype) // Turn markdown syntax tree to HTML syntax tree, ignoring embedded HTML
  .use(rehypeStringify) // Serialize HTML syntax tree
  .process('*emphasis* and **strong**')
  .then((file) => console.log(String(file)))
  .catch((error) => {
    throw error
  })

This turns *emphasis* and **strong** into <em>emphasis</em> and <strong>strong</strong>, but it does not support HTML embedded inside markdown (such as *emphasis* and <strong>strong</strong>).

This solution is safe: content you don’t trust cannot cause an XSS vulnerability.

How to turn HTML into markdown

We can also do the inverse. To turn HTML into markdown, we need rehype-parse, rehype-remark, and remark-stringify:

var unified = require('unified')
var rehypeParse = require('rehype-parse')
var rehypeRemark = require('rehype-remark')
var remarkStringify = require('remark-stringify')

unified()
  .use(rehypeParse) // Parse HTML to a syntax tree
  .use(rehypeRemark) // Turn HTML syntax tree to markdown syntax tree
  .use(remarkStringify) // Serialize HTML syntax tree
  .process('<em>emphasis</em> and <strong>strong</strong>')
  .then((file) => console.log(String(file)))
  .catch((error) => {
    throw error
  })

This turns <em>emphasis</em> and <strong>strong</strong> into *emphasis* and **strong**.

How to allow HTML embedded in markdown

Markdown is a content format that’s great for the more basic things: it’s nicer to write *emphasis* than <em>emphasis</em>. But, it’s limited: only a couple things are supported with its terse syntax. Luckily, for more complex things, markdown allows HTML inside it. A common example of this is to include a <details> element.

HTML embedded in markdown can be allowed when going from markdown to HTML by configuring remark-rehype and rehype-stringify:

var unified = require('unified')
var remarkParse = require('remark-parse')
var remarkRehype = require('remark-rehype')
var rehypeStringify = require('rehype-stringify')

unified()
  .use(remarkParse)
  .use(remarkRehype, {allowDangerousHtml: true}) // Pass raw HTML strings through.
  .use(rehypeStringify, {allowDangerousHtml: true}) // Serialize the raw HTML strings
  .process('*emphasis* and <strong>strong</strong>')
  .then((file) => console.log(String(file)))
  .catch((error) => {
    throw error
  })

This solution is not safe: content you don’t trust can cause XSS vulnerabilities.

How to properly support HTML inside markdown

To properly support HTML embedded inside markdown, we need another plugin: rehype-raw. This plugin will take the strings of HTML embedded in markdown and parse them with an actual HTML parser.

var unified = require('unified')
var remarkParse = require('remark-parse')
var remarkRehype = require('remark-rehype')
var rehypeRaw = require('rehype-raw')
var rehypeStringify = require('rehype-stringify')

unified()
  .use(remarkParse)
  .use(remarkRehype, {allowDangerousHtml: true})
  .use(rehypeRaw) // *Parse* the raw HTML strings embedded in the tree
  .use(rehypeStringify)
  .process('*emphasis* and <strong>strong</strong>')
  .then((file) => console.log(String(file)))
  .catch((error) => {
    throw error
  })

This solution is not safe: content you don’t trust can cause XSS vulnerabilities.

But because we now have a complete HTML syntax tree, we can sanitize that tree. For a safe solution, add rehype-sanitize right before rehype-stringify.