unified

Learn/Guide/Use unified

Using unified

This guide shows how unified can be used to transform a markdown file to HTML. It also shows how to generate a table of contents and sidesteps into checking prose.

Stuck? Have an idea for another guide? See support.md.

Contents

Tree transformations

For this example we start out with markdown content and then turn it into HTML. We need something to parse markdown and something to compile (stringify) HTML for that. The relevant projects are respectively remark-parse and rehype-stringify. To transform between the two syntaxes we use remark-rehype. Finally, we use unified itself to glue these together.

First set up a project. Create a folder example, enter it, and initialize a new package:

mkdir example
cd example
npm init -y

Then make sure the project is a module so that import and export work by specifying "type": "module":

--- a/package.json
+++ b/package.json
@@ -1,6 +1,7 @@
 {
   "name": "example",
   "version": "1.0.0",
+  "type": "module",
   "main": "index.js",
   "scripts": {
     "test": "echo \"Error: no test specified\" && exit 1"

Now let’s install the needed dependencies with npm, which comes bundled with Node.js.

npm install rehype-stringify remark-parse remark-rehype unified

Now create a markdown file, example.md, that we’re going to transform.

# Pluto

Pluto is an dwarf planet in the Kuiper belt.

## Contents

## History

### Discovery

In the 1840s, Urbain Le Verrier used Newtonian mechanics to predict the
position of…

### Name and symbol

The name Pluto is for the Roman god of the underworld, from a Greek epithet for
Hades…

### Planet X disproved

Once Pluto was found, its faintness and lack of a viewable disc cast doubt…

## Orbit

Pluto’s orbital period is about 248 years…

Then create index.js as well. It transforms markdown to HTML:

import fs from 'node:fs/promises'
import rehypeStringify from 'rehype-stringify'
import remarkParse from 'remark-parse'
import remarkRehype from 'remark-rehype'
import {unified} from 'unified'

const document = await fs.readFile('example.md', 'utf8')

const file = await unified()
  .use(remarkParse)
  .use(remarkRehype)
  .use(rehypeStringify).process(document)

console.log(String(file))
(alias) module "node:fs/promises"
import fs
(alias) const rehypeStringify: Plugin<[(Options | null | undefined)?], Root, string>
import rehypeStringify

Plugin to add support for serializing as HTML.

  • @this processor.
  • @param Configuration (optional).
  • @returns Nothing.
(alias) const remarkParse: Plugin<[(Readonly<Options> | null | undefined)?], string, Root>
import remarkParse

Add support for parsing from markdown.

  • @this processor.
  • @param Configuration (optional).
  • @returns Nothing.
(alias) function remarkRehype(processor: Processor, options?: Readonly<Options> | null | undefined): TransformBridge (+1 overload)
import remarkRehype

Turn markdown into HTML.

Notes
Signature
  • if a processor is given, runs the (rehype) plugins used on it with a hast tree, then discards the result (bridge mode)
  • otherwise, returns a hast tree, the plugins used after remarkRehype are rehype plugins (mutate mode)

👉 Note: It’s highly unlikely that you want to pass a processor.

HTML

Raw HTML is available in mdast as html nodes and can be embedded in hast as semistandard raw nodes. Most plugins ignore raw nodes but two notable ones don’t:

  • rehype-stringify also has an option allowDangerousHtml which will output the raw HTML. This is typically discouraged as noted by the option name but is useful if you completely trust authors
  • rehype-raw can handle the raw embedded HTML strings by parsing them into standard hast nodes (element, text, etc). This is a heavy task as it needs a full HTML parser, but it is the only way to support untrusted content
Footnotes

Many options supported here relate to footnotes. Footnotes are not specified by CommonMark, which we follow by default. They are supported by GitHub, so footnotes can be enabled in markdown with remark-gfm.

The options footnoteBackLabel and footnoteLabel define natural language that explains footnotes, which is hidden for sighted users but shown to assistive technology. When your page is not in English, you must define translated values.

Back references use ARIA attributes, but the section label itself uses a heading that is hidden with an sr-only class. To show it to sighted users, define different attributes in footnoteLabelProperties.

Clobbering

Footnotes introduces a problem, as it links footnote calls to footnote definitions on the page through id attributes generated from user content, which results in DOM clobbering.

DOM clobbering is this:

<p id=x></p>
<script>alert(x) // `x` now refers to the DOM `p#x` element</script>

Elements by their ID are made available by browsers on the window object, which is a security risk. Using a prefix solves this problem.

More information on how to handle clobbering and the prefix is explained in Example: headings (DOM clobbering) in rehype-sanitize.

Unknown nodes

Unknown nodes are nodes with a type that isn’t in handlers or passThrough. The default behavior for unknown nodes is:

  • when the node has a value (and doesn’t have data.hName, data.hProperties, or data.hChildren, see later), create a hast text node
  • otherwise, create a <div> element (which could be changed with data.hName), with its children mapped from mdast to hast as well

This behavior can be changed by passing an unknownHandler.

  • @overload
  • @overload
  • @param destination Processor or configuration (optional).
  • @param options When a processor was given, configuration (optional).
  • @returns Transform.
(alias) const unified: Processor<undefined, undefined, undefined, undefined, undefined>
import unified

Create a new processor.

  • @example This example shows how a new processor can be created (from remark) and linked to stdin(4) and stdout(4).
    import process from 'node:process'
    import concatStream from 'concat-stream'
    import {remark} from 'remark'
    
    process.stdin.pipe(
      concatStream(function (buf) {
        process.stdout.write(String(remark().processSync(buf)))
      })
    )
    
  • @returns New unfrozen processor (processor). This processor is configured to work the same as its ancestor. When the descendant processor is configured in the future it does not affect the ancestral processor.
const document: string
(alias) module "node:fs/promises"
import fs
function readFile(path: PathLike | fs.FileHandle, options: ({
    encoding: BufferEncoding;
    flag?: OpenMode | undefined;
} & EventEmitter<T extends EventMap<T> = DefaultEventMap>.Abortable) | BufferEncoding): Promise<string> (+2 overloads)

Asynchronously reads the entire contents of a file.

  • @param path A path to a file. If a URL is provided, it must use the file: protocol. If a FileHandle is provided, the underlying file will not be closed automatically.
  • @param options An object that may contain an optional flag. If a flag is not provided, it defaults to 'r'.
const file: VFile
(alias) unified(): Processor<undefined, undefined, undefined, undefined, undefined>
import unified

Create a new processor.

  • @example This example shows how a new processor can be created (from remark) and linked to stdin(4) and stdout(4).
    import process from 'node:process'
    import concatStream from 'concat-stream'
    import {remark} from 'remark'
    
    process.stdin.pipe(
      concatStream(function (buf) {
        process.stdout.write(String(remark().processSync(buf)))
      })
    )
    
  • @returns New unfrozen processor (processor). This processor is configured to work the same as its ancestor. When the descendant processor is configured in the future it does not affect the ancestral processor.
(method) Processor<undefined, undefined, undefined, undefined, undefined>.use<[], string, Root>(plugin: Plugin<[], string, Root>, ...parameters: [] | [boolean]): Processor<Root, undefined, undefined, undefined, undefined> (+2 overloads)

Configure the processor to use a plugin, a list of usable values, or a preset.

If the processor is already using a plugin, the previous plugin configuration is changed based on the options that are passed in. In other words, the plugin is not added a second time.

Note: use cannot be called on frozen processors. Call the processor first to create a new unfrozen processor.

  • @example There are many ways to pass plugins to .use(). This example gives an overview:
    import {unified} from 'unified'
    
    unified()
      // Plugin with options:
      .use(pluginA, {x: true, y: true})
      // Passing the same plugin again merges configuration (to `{x: true, y: false, z: true}`):
      .use(pluginA, {y: false, z: true})
      // Plugins:
      .use([pluginB, pluginC])
      // Two plugins, the second with options:
      .use([pluginD, [pluginE, {}]])
      // Preset with plugins and settings:
      .use({plugins: [pluginF, [pluginG, {}]], settings: {position: false}})
      // Settings only:
      .use({settings: {position: false}})
    
  • @template {Array} [Parameters=[]]
  • @template {Node | string | undefined} [Input=undefined]
  • @template [Output=Input]
  • @overload
  • @overload
  • @overload
  • @param value Usable value.
  • @param parameters Parameters, when a plugin is given as a usable value.
  • @returns Current processor.
(alias) const remarkParse: Plugin<[(Readonly<Options> | null | undefined)?], string, Root>
import remarkParse

Add support for parsing from markdown.

  • @this processor.
  • @param Configuration (optional).
  • @returns Nothing.
(method) Processor<Root, undefined, undefined, undefined, undefined>.use<[], Root, Root>(plugin: Plugin<[], Root, Root>, ...parameters: [] | [boolean]): Processor<Root, Root, Root, undefined, undefined> (+2 overloads)

Configure the processor to use a plugin, a list of usable values, or a preset.

If the processor is already using a plugin, the previous plugin configuration is changed based on the options that are passed in. In other words, the plugin is not added a second time.

Note: use cannot be called on frozen processors. Call the processor first to create a new unfrozen processor.

  • @example There are many ways to pass plugins to .use(). This example gives an overview:
    import {unified} from 'unified'
    
    unified()
      // Plugin with options:
      .use(pluginA, {x: true, y: true})
      // Passing the same plugin again merges configuration (to `{x: true, y: false, z: true}`):
      .use(pluginA, {y: false, z: true})
      // Plugins:
      .use([pluginB, pluginC])
      // Two plugins, the second with options:
      .use([pluginD, [pluginE, {}]])
      // Preset with plugins and settings:
      .use({plugins: [pluginF, [pluginG, {}]], settings: {position: false}})
      // Settings only:
      .use({settings: {position: false}})
    
  • @template {Array} [Parameters=[]]
  • @template {Node | string | undefined} [Input=undefined]
  • @template [Output=Input]
  • @overload
  • @overload
  • @overload
  • @param value Usable value.
  • @param parameters Parameters, when a plugin is given as a usable value.
  • @returns Current processor.
(alias) function remarkRehype(processor: Processor, options?: Readonly<Options> | null | undefined): TransformBridge (+1 overload)
import remarkRehype

Turn markdown into HTML.

Notes
Signature
  • if a processor is given, runs the (rehype) plugins used on it with a hast tree, then discards the result (bridge mode)
  • otherwise, returns a hast tree, the plugins used after remarkRehype are rehype plugins (mutate mode)

👉 Note: It’s highly unlikely that you want to pass a processor.

HTML

Raw HTML is available in mdast as html nodes and can be embedded in hast as semistandard raw nodes. Most plugins ignore raw nodes but two notable ones don’t:

  • rehype-stringify also has an option allowDangerousHtml which will output the raw HTML. This is typically discouraged as noted by the option name but is useful if you completely trust authors
  • rehype-raw can handle the raw embedded HTML strings by parsing them into standard hast nodes (element, text, etc). This is a heavy task as it needs a full HTML parser, but it is the only way to support untrusted content
Footnotes

Many options supported here relate to footnotes. Footnotes are not specified by CommonMark, which we follow by default. They are supported by GitHub, so footnotes can be enabled in markdown with remark-gfm.

The options footnoteBackLabel and footnoteLabel define natural language that explains footnotes, which is hidden for sighted users but shown to assistive technology. When your page is not in English, you must define translated values.

Back references use ARIA attributes, but the section label itself uses a heading that is hidden with an sr-only class. To show it to sighted users, define different attributes in footnoteLabelProperties.

Clobbering

Footnotes introduces a problem, as it links footnote calls to footnote definitions on the page through id attributes generated from user content, which results in DOM clobbering.

DOM clobbering is this:

<p id=x></p>
<script>alert(x) // `x` now refers to the DOM `p#x` element</script>

Elements by their ID are made available by browsers on the window object, which is a security risk. Using a prefix solves this problem.

More information on how to handle clobbering and the prefix is explained in Example: headings (DOM clobbering) in rehype-sanitize.

Unknown nodes

Unknown nodes are nodes with a type that isn’t in handlers or passThrough. The default behavior for unknown nodes is:

  • when the node has a value (and doesn’t have data.hName, data.hProperties, or data.hChildren, see later), create a hast text node
  • otherwise, create a <div> element (which could be changed with data.hName), with its children mapped from mdast to hast as well

This behavior can be changed by passing an unknownHandler.

  • @overload
  • @overload
  • @param destination Processor or configuration (optional).
  • @param options When a processor was given, configuration (optional).
  • @returns Transform.
(method) Processor<Root, Root, Root, undefined, undefined>.use<[], Root, string>(plugin: Plugin<[], Root, string>, ...parameters: [] | [boolean]): Processor<Root, Root, Root, Root, string> (+2 overloads)

Configure the processor to use a plugin, a list of usable values, or a preset.

If the processor is already using a plugin, the previous plugin configuration is changed based on the options that are passed in. In other words, the plugin is not added a second time.

Note: use cannot be called on frozen processors. Call the processor first to create a new unfrozen processor.

  • @example There are many ways to pass plugins to .use(). This example gives an overview:
    import {unified} from 'unified'
    
    unified()
      // Plugin with options:
      .use(pluginA, {x: true, y: true})
      // Passing the same plugin again merges configuration (to `{x: true, y: false, z: true}`):
      .use(pluginA, {y: false, z: true})
      // Plugins:
      .use([pluginB, pluginC])
      // Two plugins, the second with options:
      .use([pluginD, [pluginE, {}]])
      // Preset with plugins and settings:
      .use({plugins: [pluginF, [pluginG, {}]], settings: {position: false}})
      // Settings only:
      .use({settings: {position: false}})
    
  • @template {Array} [Parameters=[]]
  • @template {Node | string | undefined} [Input=undefined]
  • @template [Output=Input]
  • @overload
  • @overload
  • @overload
  • @param value Usable value.
  • @param parameters Parameters, when a plugin is given as a usable value.
  • @returns Current processor.
(alias) const rehypeStringify: Plugin<[(Options | null | undefined)?], Root, string>
import rehypeStringify

Plugin to add support for serializing as HTML.

  • @this processor.
  • @param Configuration (optional).
  • @returns Nothing.
(method) Processor<Root, Root, Root, Root, string>.process(file?: Compatible | undefined): Promise<VFile> (+1 overload)

Process the given file as configured on the processor.

Note: process freezes the processor if not already frozen.

Note: process performs the parse, run, and stringify phases.

  • @overload
  • @overload
  • @param file File (optional); typically string or VFile]; any value accepted as x in new VFile(x).
  • @param done Callback (optional).
  • @returns Nothing if done is given. Otherwise a promise, rejected with a fatal error or resolved with the processed file. The parsed, transformed, and compiled value is available at file.value (see note).

    Note: unified typically compiles by serializing: most compilers return string (or Uint8Array). Some compilers, such as the one configured with rehype-react, return other values (in this case, a React tree). If you’re using a compiler that doesn’t serialize, expect different result values.

    To register custom results in TypeScript, add them to {@linkcode CompileResultMap}.

const document: string
namespace console
var console: Console

The console module provides a simple debugging console that is similar to the JavaScript console mechanism provided by web browsers.

The module exports two specific components:

  • A Console class with methods such as console.log(), console.error() and console.warn() that can be used to write to any Node.js stream.
  • A global console instance configured to write to process.stdout and process.stderr. The global console can be used without importing the node:console module.

Warning: The global console object's methods are neither consistently synchronous like the browser APIs they resemble, nor are they consistently asynchronous like all other Node.js streams. See the note on process I/O for more information.

Example using the global console:

console.log('hello world');
// Prints: hello world, to stdout
console.log('hello %s', 'world');
// Prints: hello world, to stdout
console.error(new Error('Whoops, something bad happened'));
// Prints error message and stack trace to stderr:
//   Error: Whoops, something bad happened
//     at [eval]:5:15
//     at Script.runInThisContext (node:vm:132:18)
//     at Object.runInThisContext (node:vm:309:38)
//     at node:internal/process/execution:77:19
//     at [eval]-wrapper:6:22
//     at evalScript (node:internal/process/execution:76:60)
//     at node:internal/main/eval_string:23:3

const name = 'Will Robinson';
console.warn(`Danger ${name}! Danger!`);
// Prints: Danger Will Robinson! Danger!, to stderr

Example using the Console class:

const out = getStreamSomehow();
const err = getStreamSomehow();
const myConsole = new console.Console(out, err);

myConsole.log('hello world');
// Prints: hello world, to out
myConsole.log('hello %s', 'world');
// Prints: hello world, to out
myConsole.error(new Error('Whoops, something bad happened'));
// Prints: [Error: Whoops, something bad happened], to err

const name = 'Will Robinson';
myConsole.warn(`Danger ${name}! Danger!`);
// Prints: Danger Will Robinson! Danger!, to err
(method) Console.log(message?: any, ...optionalParams: any[]): void

Prints to stdout with newline. Multiple arguments can be passed, with the first used as the primary message and all additional used as substitution values similar to printf(3) (the arguments are all passed to util.format()).

const count = 5;
console.log('count: %d', count);
// Prints: count: 5, to stdout
console.log('count:', count);
// Prints: count: 5, to stdout

See util.format() for more information.

  • @since v0.1.100
var String: StringConstructor
(value?: any) => string

Allows manipulation and formatting of text strings and determination and location of substrings within strings.

const file: VFile

Now, running our module with Node:

node index.js

…gives us an example.html file that looks as follows:

<h1>Pluto</h1>
<p>Pluto is an dwarf planet in the Kuiper belt.</p>
<h2>Contents</h2>
<h2>History</h2>
<h3>Discovery</h3>
<p>In the 1840s, Urbain Le Verrier used Newtonian mechanics to predict the
position of…</p>
<h3>Name and symbol</h3>
<p>The name Pluto is for the Roman god of the underworld, from a Greek epithet for
Hades…</p>
<h3>Planet X disproved</h3>
<p>Once Pluto was found, its faintness and lack of a viewable disc cast doubt…</p>
<h2>Orbit</h2>
<p>Pluto’s orbital period is about 248 years…</p>

👉 Note that remark-rehype doesn’t deal with HTML inside the markdown. See HTML and remark for more info.

🎉 Nifty! It doesn’t do much yet. We’ll get there. In the next section, we make this more useful by introducing plugins.

Plugins

We’re still missing some things Notably a table of contents and proper HTML document structure.

We can use rehype-slug and remark-toc for the former and rehype-document for the latter task.

npm install rehype-document rehype-slug remark-toc

Let’s now use those two as well, by modifying our index.js file:

--- a/index.js
+++ b/index.js
@@ -1,14 +1,20 @@
 import fs from 'node:fs/promises'
+import rehypeDocument from 'rehype-document'
+import rehypeSlug from 'rehype-slug'
 import rehypeStringify from 'rehype-stringify'
 import remarkParse from 'remark-parse'
 import remarkRehype from 'remark-rehype'
+import remarkToc from 'remark-toc'
 import {unified} from 'unified'

 const document = await fs.readFile('example.md', 'utf8')

 const file = await unified()
   .use(remarkParse)
+  .use(remarkToc)
   .use(remarkRehype)
+  .use(rehypeSlug)
+  .use(rehypeDocument, {title: 'Pluto'})
   .use(rehypeStringify)
   .process(document)

We pass options to rehype-document. In this case, we use that to make sure we get a proper <title> element in our <head>, as required by the HTML specification. More options are accepted by rehype-document, such as which language tag to use. These are described in detail in its readme.md. Many other plugins accept options as well, so make sure to read through their docs to learn more.

👉 Note that remark plugins work on a markdown tree. rehype plugins work on an HTML tree. It’s important that you place your .use calls in the correct places: plugins are order sensitive!

When running our module like before, we’d get the following example.html file:

<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Pluto</title>
<meta content="width=device-width, initial-scale=1" name="viewport">
</head>
<body>
<h1 id="pluto">Pluto</h1>
<p>Pluto is an dwarf planet in the Kuiper belt.</p>
<h2 id="contents">Contents</h2>
<ul>
<li><a href="#history">History</a>
<ul>
<li><a href="#discovery">Discovery</a></li>
<li><a href="#name-and-symbol">Name and symbol</a></li>
<li><a href="#planet-x-disproved">Planet X disproved</a></li>
</ul>
</li>
<li><a href="#orbit">Orbit</a></li>
</ul>
<h2 id="history">History</h2>
<h3 id="discovery">Discovery</h3>
<p>In the 1840s, Urbain Le Verrier used Newtonian mechanics to predict the
position of…</p>
<h3 id="name-and-symbol">Name and symbol</h3>
<p>The name Pluto is for the Roman god of the underworld, from a Greek epithet for
Hades…</p>
<h3 id="planet-x-disproved">Planet X disproved</h3>
<p>Once Pluto was found, its faintness and lack of a viewable disc cast doubt…</p>
<h2 id="orbit">Orbit</h2>
<p>Pluto’s orbital period is about 248 years…</p>
</body>
</html>

👉 Note that the document isn’t formatted nicely. There’s a plugin for that though! Feel free to add rehype-format to the plugins. Right after rehypeDocument!

💯 You’re acing it! This is getting pretty useful, right?

In the next section, we lay the groundwork for creating a report.

Reporting

Before we check some prose, let’s first switch up our index.js file to print a pretty report.

We can use to-vfile to read and write virtual files from the file system. Then we can use vfile-reporter to report messages relating to those files. Let’s install those.

npm install to-vfile vfile-reporter

…and then use vfile in our example instead, like so:

--- a/index.js
+++ b/index.js
@@ -1,21 +1,24 @@
-import fs from 'node:fs/promises'
 import rehypeDocument from 'rehype-document'
 import rehypeSlug from 'rehype-slug'
 import rehypeStringify from 'rehype-stringify'
 import remarkParse from 'remark-parse'
 import remarkRehype from 'remark-rehype'
 import remarkToc from 'remark-toc'
+import {read, write} from 'to-vfile'
 import {unified} from 'unified'
+import {reporter} from 'vfile-reporter'

-const document = await fs.readFile('example.md', 'utf8')
+const file = await read('example.md')

-const file = await unified()
+await unified()
   .use(remarkParse)
   .use(remarkToc)
   .use(remarkRehype)
   .use(rehypeSlug)
   .use(rehypeDocument, {title: 'Pluto'})
   .use(rehypeStringify)
-  .process(document)
+  .process(file)

-console.log(String(file))
+console.error(reporter(file))
+file.extname = '.html'
+await write(file)

If we now run our module on its own we get a report showing everything’s fine:

$ node index.js
example.md: no issues found

But everything’s not fine: there’s a typo in the markdown! The next section shows how to detect prose errors by adding retext.

Checking prose

I did notice a typo in there. So let’s check some prose to prevent that from happening in the future. We can use retext and its ecosystem for our natural language parsing. As we’re writing in English, we use retext-english specifically to parse English natural language. The problem in our example.md file is that it has an dwarf planet instead of a dwarf planet, which is conveniently checked for by retext-indefinite-article. To bridge from markup to prose we use remark-retext. Let’s install these dependencies as well.

npm install remark-retext retext-english retext-indefinite-article

…and change our index.js like so:

--- a/index.js
+++ b/index.js
@@ -3,7 +3,10 @@ import rehypeSlug from 'rehype-slug'
 import rehypeStringify from 'rehype-stringify'
 import remarkParse from 'remark-parse'
 import remarkRehype from 'remark-rehype'
+import remarkRetext from 'remark-retext'
 import remarkToc from 'remark-toc'
+import retextEnglish from 'retext-english'
+import retextIndefiniteArticle from 'retext-indefinite-article'
 import {read, write} from 'to-vfile'
 import {unified} from 'unified'
 import {reporter} from 'vfile-reporter'
@@ -12,6 +15,8 @@ const file = await read('example.md')

 await unified()
   .use(remarkParse)
+  // @ts-expect-error: fine.
+  .use(remarkRetext, unified().use(retextEnglish).use(retextIndefiniteArticle))
   .use(remarkToc)
   .use(remarkRehype)
   .use(rehypeSlug)

As the code shows, remark-retext receives another unified pipeline. A natural language pipeline. The plugin will transform the origin syntax (markdown) with the parser defined on the given pipeline. Then it runs the attached plugins on the natural language syntax tree.

Now when running our module one final time:

$ node index.js
example.md
3:10-3:12 warning Unexpected article `an` before `dwarf`, expected `a` retext-indefinite-article retext-indefinite-article

⚠ 1 warning

…we get a useful message.

💃 You’ve got a really cool system set up already. Nicely done! That’s a wrap though, check out the next section for further exercises and resources.

Further exercises

Finally, check out the lists of available plugins for retext, remark, and rehype, and try some of them out.

If you haven’t already, check out the other articles in the learn section!