HTML and remark
remark is a markdown compiler. It’s concerned with HTML in two ways:
- markdown is often turned into HTML
- markdown sometimes has embedded HTML
When dealing with HTML and markdown, we will use both remark and rehype. This article shows some examples of how to do that.
Contents
- How to turn markdown into HTML
- How to turn HTML into markdown
- How to allow HTML embedded in markdown
- How to properly support HTML inside markdown
How to turn markdown into HTML
remark handles markdown: it can parse and serialize it. But it’s not for HTML. That’s what rehype does, which exists to parse and serialize HTML.
To turn markdown into HTML, we need remark-parse
, remark-rehype
, and rehype-stringify
:
import {unified} from 'unified'
import remarkParse from 'remark-parse'
import remarkRehype from 'remark-rehype'
import rehypeStringify from 'rehype-stringify'
unified()
.use(remarkParse) // Parse markdown content to a syntax tree
.use(remarkRehype) // Turn markdown syntax tree to HTML syntax tree, ignoring embedded HTML
.use(rehypeStringify) // Serialize HTML syntax tree
.process('*emphasis* and **strong**')
.then((file) => console.log(String(file)))
.catch((error) => {
throw error
})
This turns *emphasis* and **strong**
into <em>emphasis</em> and <strong>strong</strong>
, but it does not support HTML embedded inside markdown (such as *emphasis* and <strong>strong</strong>
).
This solution is safe: content you don’t trust cannot cause an XSS vulnerability.
How to turn HTML into markdown
We can also do the inverse. To turn HTML into markdown, we need rehype-parse
, rehype-remark
, and remark-stringify
:
import {unified} from 'unified'
import rehypeParse from 'rehype-parse'
import rehypeRemark from 'rehype-remark'
import remarkStringify from 'remark-stringify'
unified()
.use(rehypeParse) // Parse HTML to a syntax tree
.use(rehypeRemark) // Turn HTML syntax tree to markdown syntax tree
.use(remarkStringify) // Serialize HTML syntax tree
.process('<em>emphasis</em> and <strong>strong</strong>')
.then((file) => console.log(String(file)))
.catch((error) => {
throw error
})
This turns <em>emphasis</em> and <strong>strong</strong>
into *emphasis* and **strong**
.
How to allow HTML embedded in markdown
Markdown is a content format that’s great for the more basic things: it’s nicer to write *emphasis*
than <em>emphasis</em>
. But, it’s limited: only a couple things are supported with its terse syntax. Luckily, for more complex things, markdown allows HTML inside it. A common example of this is to include a <details>
element.
HTML embedded in markdown can be allowed when going from markdown to HTML by configuring remark-rehype
and rehype-stringify
:
import {unified} from 'unified'
import remarkParse from 'remark-parse'
import remarkRehype from 'remark-rehype'
import rehypeStringify from 'rehype-stringify'
unified()
.use(remarkParse)
.use(remarkRehype, {allowDangerousHtml: true}) // Pass raw HTML strings through.
.use(rehypeStringify, {allowDangerousHtml: true}) // Serialize the raw HTML strings
.process('*emphasis* and <strong>strong</strong>')
.then((file) => console.log(String(file)))
.catch((error) => {
throw error
})
This solution is not safe: content you don’t trust can cause XSS vulnerabilities.
How to properly support HTML inside markdown
To properly support HTML embedded inside markdown, we need another plugin: rehype-raw
. This plugin will take the strings of HTML embedded in markdown and parse them with an actual HTML parser.
import {unified} from 'unified'
import remarkParse from 'remark-parse'
import remarkRehype from 'remark-rehype'
import rehypeRaw from 'rehype-raw'
import rehypeStringify from 'rehype-stringify'
unified()
.use(remarkParse)
.use(remarkRehype, {allowDangerousHtml: true})
.use(rehypeRaw) // *Parse* the raw HTML strings embedded in the tree
.use(rehypeStringify)
.process('*emphasis* and <strong>strong</strong>')
.then((file) => console.log(String(file)))
.catch((error) => {
throw error
})
This solution is not safe: content you don’t trust can cause XSS vulnerabilities.
But because we now have a complete HTML syntax tree, we can sanitize that tree. For a safe solution, add rehype-sanitize
right before rehype-stringify
.