unified

Learn/Guide/Intro to syntax trees

Introduction to syntax trees

unified uses abstract syntax trees (abbreviated as ASTs), that plugins can work on. This guide introduces what ASTs are and how to work with them.

Contents

What is an AST?

An abstract syntax tree (AST) is a tree representation of the syntax of programming languages. For us that’s typically markup languages.

As a JavaScript developer you may already know things that are like ASTs: The DOM and React’s virtual DOM. Or you may have heard of Babel, ESLint, PostCSS, Prettier, or TypeScript. They all use ASTs to inspect and transform code.

In unified, we support several ASTs. The reason for different ASTs is that each markup language has several aspects that do not translate 1-to-1 to other markup languages. Taking markdown and HTML as an example, in some cases markdown has more info than HTML: markdown has several ways to add a link (“autolinks”: <https://url>, resource links: [label](url), and reference links with definitions: [label][id] and [id]: url). In other cases, HTML has more info than markdown. It has many tags, which add new meaning (semantics), that aren’t available in markdown. If there was one AST, it would be quite hard to do the tasks that several remark and rehype plugins now do.

See “How to build a syntax tree” for more info on how to make a tree. See “Syntax trees in TypeScript” on how to work with ASTs in TypeScript.

What is unist?

But all our ASTs have things in common. The bit in common is called unist. By having a shared interface, we can also share tools that work on all ASTs. In practice, that means you can use for example unist-util-visit to visit nodes in any supported AST.

See “Tree traversal” for more info on unist-util-visit.

unist is different from the ASTs used in other tools. Quite noticeable because it uses a particular set of names for things: type, children, position. But perhaps harder to see is that it’s compatible with JSON. It’s all objects and arrays. Strings, numbers. Where other tools use instances with methods, we use plain data. Years ago in retext we started out like that too. But we found that we preferred to be able to read and write a tree from/to a JSON file, to treat ASTs as data, and use more functional utilities.

When to use an AST?

You can use an AST when you want to inspect or transform content.

Say you wanted to count the number of headings in a markdown file. You could also do that with a regex:

const value = `# Pluto

Pluto is a dwarf planet in the Kuiper belt.

## History

### Discovery

In the 1840s, Urbain Le Verrier used Newtonian mechanics to predict the
position of…`

const expression = /^#+[^\r\n]+/gm
const headings = [...value.matchAll(expression)].length

console.log(headings) //=> 3
const value: "# Pluto\n\nPluto is a dwarf planet in the Kuiper belt.\n\n## History\n\n### Discovery\n\nIn the 1840s, Urbain Le Verrier used Newtonian mechanics to predict the\nposition of…"
const expression: RegExp
const headings: number
const value: "# Pluto\n\nPluto is a dwarf planet in the Kuiper belt.\n\n## History\n\n### Discovery\n\nIn the 1840s, Urbain Le Verrier used Newtonian mechanics to predict the\nposition of…"
(method) String.matchAll(regexp: RegExp): RegExpStringIterator<RegExpExecArray>

Matches a string with a regular expression, and returns an iterable of matches containing the results of that search.

  • @param regexp A variable name or string literal containing the regular expression pattern and flags.
const expression: RegExp
(property) Array<RegExpExecArray>.length: number

Gets or sets the length of the array. This is a number one higher than the highest index in the array.

var console: Console
(method) console.Console.log(...data: any[]): void
const headings: number

But what if the headings were in a code block? Or if Setext headings were used instead of ATX headings? The grammar of markdown is more complex than a regex can handle. That’s where an AST can help.

import {fromMarkdown} from 'mdast-util-from-markdown'
import {visit} from 'unist-util-visit'

const value = `# Pluto

Pluto is a dwarf planet in the Kuiper belt.

## History

### Discovery

In the 1840s, Urbain Le Verrier used Newtonian mechanics to predict the
position of…`

const tree = fromMarkdown(value)

let headings = 0

visit(tree, 'heading', function () {
  headings++
})

console.log(headings) //=> 3
(alias) function fromMarkdown(value: Value, encoding?: Encoding | null | undefined, options?: Options | null | undefined): Root (+1 overload)
import fromMarkdown

Turn markdown into a syntax tree.

  • @overload
  • @overload
  • @param value Markdown to parse.
  • @param encoding Character encoding for when value is Buffer.
  • @param options Configuration.
  • @returns mdast tree.
(alias) function visit<Tree extends Node, Check extends Test>(tree: Tree, check: Check, visitor: BuildVisitor<Tree, Check>, reverse?: boolean | null | undefined): undefined (+1 overload)
import visit

Visit nodes.

This algorithm performs depth-first tree traversal in preorder (NLR) or if reverse is given, in reverse preorder (NRL).

You can choose for which nodes visitor is called by passing a test. For complex tests, you should test yourself in visitor, as it will be faster and will have improved type information.

Walking the tree is an intensive task. Make use of the return values of the visitor when possible. Instead of walking a tree multiple times, walk it once, use unist-util-is to check if a node matches, and then perform different operations.

You can change the tree. See Visitor for more info.

  • @overload
  • @overload
  • @param tree Tree to traverse.
  • @param testOrVisitor unist-util-is-compatible test (optional, omit to pass a visitor).
  • @param visitorOrReverse Handle each node (when test is omitted, pass reverse).
  • @param maybeReverse Traverse in reverse preorder (NRL) instead of the default preorder (NLR).
  • @returns Nothing.
  • @template {UnistNode} Tree Node type.
  • @template {Test} Check unist-util-is-compatible test.
const value: "# Pluto\n\nPluto is a dwarf planet in the Kuiper belt.\n\n## History\n\n### Discovery\n\nIn the 1840s, Urbain Le Verrier used Newtonian mechanics to predict the\nposition of…"
const tree: Root
(alias) fromMarkdown(value: Value, encoding?: Encoding | null | undefined, options?: Options | null | undefined): Root (+1 overload)
import fromMarkdown

Turn markdown into a syntax tree.

  • @overload
  • @overload
  • @param value Markdown to parse.
  • @param encoding Character encoding for when value is Buffer.
  • @param options Configuration.
  • @returns mdast tree.
const value: "# Pluto\n\nPluto is a dwarf planet in the Kuiper belt.\n\n## History\n\n### Discovery\n\nIn the 1840s, Urbain Le Verrier used Newtonian mechanics to predict the\nposition of…"
let headings: number
(alias) visit<Root, "heading">(tree: Root, check: "heading", visitor: BuildVisitor<Root, "heading">, reverse?: boolean | null | undefined): undefined (+1 overload)
import visit

Visit nodes.

This algorithm performs depth-first tree traversal in preorder (NLR) or if reverse is given, in reverse preorder (NRL).

You can choose for which nodes visitor is called by passing a test. For complex tests, you should test yourself in visitor, as it will be faster and will have improved type information.

Walking the tree is an intensive task. Make use of the return values of the visitor when possible. Instead of walking a tree multiple times, walk it once, use unist-util-is to check if a node matches, and then perform different operations.

You can change the tree. See Visitor for more info.

  • @overload
  • @overload
  • @param tree Tree to traverse.
  • @param testOrVisitor unist-util-is-compatible test (optional, omit to pass a visitor).
  • @param visitorOrReverse Handle each node (when test is omitted, pass reverse).
  • @param maybeReverse Traverse in reverse preorder (NRL) instead of the default preorder (NLR).
  • @returns Nothing.
  • @template {UnistNode} Tree Node type.
  • @template {Test} Check unist-util-is-compatible test.
const tree: Root
let headings: number
var console: Console
(method) console.Console.log(...data: any[]): void
let headings: number

See “Tree traversal” for more info on unist-util-visit.