Introduction to syntax trees
unified uses abstract syntax trees (abbreviated as ASTs), that plugins can work on. This guide introduces what ASTs are and how to work with them.
Contents
What is an AST?
An abstract syntax tree (AST) is a tree representation of the syntax of programming languages. For us that’s typically markup languages.
As a JavaScript developer you may already know things that are like ASTs: The DOM and React’s virtual DOM. Or you may have heard of Babel, ESLint, PostCSS, Prettier, or TypeScript. They all use ASTs to inspect and transform code.
In unified, we support several ASTs. The reason for different ASTs is that each markup language has several aspects that do not translate 1-to-1 to other markup languages. Taking markdown and HTML as an example, in some cases markdown has more info than HTML: markdown has several ways to add a link (“autolinks”: <https://url>
, resource links: [label](url)
, and reference links with definitions: [label][id]
and [id]: url
). In other cases, HTML has more info than markdown. It has many tags, which add new meaning (semantics), that aren’t available in markdown. If there was one AST, it would be quite hard to do the tasks that several remark and rehype plugins now do.
See “How to build a syntax tree” for more info on how to make a tree. See “Syntax trees in TypeScript” on how to work with ASTs in TypeScript.
What is unist?
But all our ASTs have things in common. The bit in common is called unist. By having a shared interface, we can also share tools that work on all ASTs. In practice, that means you can use for example unist-util-visit
to visit nodes in any supported AST.
See “Tree traversal” for more info on unist-util-visit
.
unist is different from the ASTs used in other tools. Quite noticeable because it uses a particular set of names for things: type
, children
, position
. But perhaps harder to see is that it’s compatible with JSON. It’s all objects and arrays. Strings, numbers. Where other tools use instances with methods, we use plain data. Years ago in retext we started out like that too. But we found that we preferred to be able to read and write a tree from/to a JSON file, to treat ASTs as data, and use more functional utilities.
When to use an AST?
You can use an AST when you want to inspect or transform content.
Say you wanted to count the number of headings in a markdown file. You could also do that with a regex:
const value = `# Pluto
Pluto is a dwarf planet in the Kuiper belt.
## History
### Discovery
In the 1840s, Urbain Le Verrier used Newtonian mechanics to predict the
position of…`
const expression = /^#+[^\r\n]+/gm
const headings = [...value.matchAll(expression)].length
console.log(headings) //=> 3
const value: "# Pluto\n\nPluto is a dwarf planet in the Kuiper belt.\n\n## History\n\n### Discovery\n\nIn the 1840s, Urbain Le Verrier used Newtonian mechanics to predict the\nposition of…"
const expression: RegExp
const headings: number
const value: "# Pluto\n\nPluto is a dwarf planet in the Kuiper belt.\n\n## History\n\n### Discovery\n\nIn the 1840s, Urbain Le Verrier used Newtonian mechanics to predict the\nposition of…"
(method) String.matchAll(regexp: RegExp): RegExpStringIterator<RegExpExecArray>
Matches a string with a regular expression, and returns an iterable of matches containing the results of that search.
- @param regexp A variable name or string literal containing the regular expression pattern and flags.
const expression: RegExp
(property) Array<RegExpExecArray>.length: number
Gets or sets the length of the array. This is a number one higher than the highest index in the array.
namespace console
var console: Console
The console
module provides a simple debugging console that is similar to the JavaScript console mechanism provided by web browsers.
The module exports two specific components:
- A
Console
class with methods such asconsole.log()
,console.error()
andconsole.warn()
that can be used to write to any Node.js stream. - A global
console
instance configured to write toprocess.stdout
andprocess.stderr
. The globalconsole
can be used without importing thenode:console
module.
Warning: The global console object's methods are neither consistently synchronous like the browser APIs they resemble, nor are they consistently asynchronous like all other Node.js streams. See the note on process I/O
for more information.
Example using the global console
:
console.log('hello world');
// Prints: hello world, to stdout
console.log('hello %s', 'world');
// Prints: hello world, to stdout
console.error(new Error('Whoops, something bad happened'));
// Prints error message and stack trace to stderr:
// Error: Whoops, something bad happened
// at [eval]:5:15
// at Script.runInThisContext (node:vm:132:18)
// at Object.runInThisContext (node:vm:309:38)
// at node:internal/process/execution:77:19
// at [eval]-wrapper:6:22
// at evalScript (node:internal/process/execution:76:60)
// at node:internal/main/eval_string:23:3
const name = 'Will Robinson';
console.warn(`Danger ${name}! Danger!`);
// Prints: Danger Will Robinson! Danger!, to stderr
Example using the Console
class:
const out = getStreamSomehow();
const err = getStreamSomehow();
const myConsole = new console.Console(out, err);
myConsole.log('hello world');
// Prints: hello world, to out
myConsole.log('hello %s', 'world');
// Prints: hello world, to out
myConsole.error(new Error('Whoops, something bad happened'));
// Prints: [Error: Whoops, something bad happened], to err
const name = 'Will Robinson';
myConsole.warn(`Danger ${name}! Danger!`);
// Prints: Danger Will Robinson! Danger!, to err
- @see source
(method) Console.log(message?: any, ...optionalParams: any[]): void
Prints to stdout
with newline. Multiple arguments can be passed, with the first used as the primary message and all additional used as substitution values similar to printf(3)
(the arguments are all passed to util.format()
).
const count = 5;
console.log('count: %d', count);
// Prints: count: 5, to stdout
console.log('count:', count);
// Prints: count: 5, to stdout
See util.format()
for more information.
- @since v0.1.100
const headings: number
But what if the headings were in a code block? Or if Setext headings were used instead of ATX headings? The grammar of markdown is more complex than a regex can handle. That’s where an AST can help.
import {fromMarkdown} from 'mdast-util-from-markdown'
import {visit} from 'unist-util-visit'
const value = `# Pluto
Pluto is a dwarf planet in the Kuiper belt.
## History
### Discovery
In the 1840s, Urbain Le Verrier used Newtonian mechanics to predict the
position of…`
const tree = fromMarkdown(value)
let headings = 0
visit(tree, 'heading', function () {
headings++
})
console.log(headings) //=> 3
(alias) function fromMarkdown(value: Value, encoding?: Encoding | null | undefined, options?: Options | null | undefined): Root (+1 overload)
import fromMarkdown
Turn markdown into a syntax tree.
- @overload
- @overload
- @param value Markdown to parse.
- @param encoding Character encoding for when
value
isBuffer
. - @param options Configuration.
- @returns mdast tree.
(alias) function visit<Tree extends Node, Check extends Test>(tree: Tree, check: Check, visitor: BuildVisitor<Tree, Check>, reverse?: boolean | null | undefined): undefined (+1 overload)
import visit
const value: "# Pluto\n\nPluto is a dwarf planet in the Kuiper belt.\n\n## History\n\n### Discovery\n\nIn the 1840s, Urbain Le Verrier used Newtonian mechanics to predict the\nposition of…"
const tree: Root
(alias) fromMarkdown(value: Value, encoding?: Encoding | null | undefined, options?: Options | null | undefined): Root (+1 overload)
import fromMarkdown
Turn markdown into a syntax tree.
- @overload
- @overload
- @param value Markdown to parse.
- @param encoding Character encoding for when
value
isBuffer
. - @param options Configuration.
- @returns mdast tree.
const value: "# Pluto\n\nPluto is a dwarf planet in the Kuiper belt.\n\n## History\n\n### Discovery\n\nIn the 1840s, Urbain Le Verrier used Newtonian mechanics to predict the\nposition of…"
let headings: number
(alias) visit<Root, "heading">(tree: Root, check: "heading", visitor: BuildVisitor<Root, "heading">, reverse?: boolean | null | undefined): undefined (+1 overload)
import visit
const tree: Root
let headings: number
namespace console
var console: Console
The console
module provides a simple debugging console that is similar to the JavaScript console mechanism provided by web browsers.
The module exports two specific components:
- A
Console
class with methods such asconsole.log()
,console.error()
andconsole.warn()
that can be used to write to any Node.js stream. - A global
console
instance configured to write toprocess.stdout
andprocess.stderr
. The globalconsole
can be used without importing thenode:console
module.
Warning: The global console object's methods are neither consistently synchronous like the browser APIs they resemble, nor are they consistently asynchronous like all other Node.js streams. See the note on process I/O
for more information.
Example using the global console
:
console.log('hello world');
// Prints: hello world, to stdout
console.log('hello %s', 'world');
// Prints: hello world, to stdout
console.error(new Error('Whoops, something bad happened'));
// Prints error message and stack trace to stderr:
// Error: Whoops, something bad happened
// at [eval]:5:15
// at Script.runInThisContext (node:vm:132:18)
// at Object.runInThisContext (node:vm:309:38)
// at node:internal/process/execution:77:19
// at [eval]-wrapper:6:22
// at evalScript (node:internal/process/execution:76:60)
// at node:internal/main/eval_string:23:3
const name = 'Will Robinson';
console.warn(`Danger ${name}! Danger!`);
// Prints: Danger Will Robinson! Danger!, to stderr
Example using the Console
class:
const out = getStreamSomehow();
const err = getStreamSomehow();
const myConsole = new console.Console(out, err);
myConsole.log('hello world');
// Prints: hello world, to out
myConsole.log('hello %s', 'world');
// Prints: hello world, to out
myConsole.error(new Error('Whoops, something bad happened'));
// Prints: [Error: Whoops, something bad happened], to err
const name = 'Will Robinson';
myConsole.warn(`Danger ${name}! Danger!`);
// Prints: Danger Will Robinson! Danger!, to err
- @see source
(method) Console.log(message?: any, ...optionalParams: any[]): void
Prints to stdout
with newline. Multiple arguments can be passed, with the first used as the primary message and all additional used as substitution values similar to printf(3)
(the arguments are all passed to util.format()
).
const count = 5;
console.log('count: %d', count);
// Prints: count: 5, to stdout
console.log('count:', count);
// Prints: count: 5, to stdout
See util.format()
for more information.
- @since v0.1.100
let headings: number
See “Tree traversal” for more info on unist-util-visit
.