unified

Learn/Guide/Syntax trees in TypeScript

How to work with syntax trees in TypeScript

This guide will introduce you to using unist and unified with TypeScript.

Contents

Basics

All unified syntax trees are based off unist. The core types are available in a types only package: @types/unist. The main type is Node. Everything else extends it. Literal and Parent are more specific types which also extend Node.

The types provided by unist are abstract interfaces. Often, you will instead use more practical interfaces depending on what language you’re working with. Each language supported by unified, like markdown, HTML, and XML, has its own syntax tree standard which extends unist.

Let’s take a look at these.

unist

Node

Node is the syntactic unit of syntax trees. Each node extends Node (sometimes through Literal or Parent) and sets type to a string literal. The type field tells us what kind of content the node is. This field uniquely identifies a kind of content. in TypeScript that’s called a discriminated union. For example in markdown (mdast) Node will be extended to make things such as a Heading or Link, which respectively use a type field of 'heading' and 'link'.

A node can optionally include a Data interface at the data field. This is an object (dictionary) that stores extra metadata which is not standard to the node but defined by the ecosystem (utilities and plugins).

When a syntax tree is parsed from a file, it includes positional information: a Position interface at the position field. This describes where the node occurred in the source file.

/**
 * Info associated with nodes by the ecosystem.
 *
 * This space is guaranteed to never be specified by unist or specifications
 * implementing unist.
 * But you can use it in utilities and plugins to store data.
 *
 * This type can be augmented to register custom data.
 * For example:
 *
 * ```ts
 * declare module 'unist' {
 *   interface Data {
 *     // `someNode.data.myId` is typed as `number | undefined`
 *     myId?: number | undefined
 *   }
 * }
 * ```
 */
interface Data {}

/**
 * One place in a source file.
 */
interface Point {
  /**
   * Line in a source file (1-indexed integer).
   */
  line: number

  /**
   * Column in a source file (1-indexed integer).
   */
  column: number
  /**
   * Character in a source file (0-indexed integer).
   */
  offset?: number | undefined
}

/**
 * Position of a node in a source document.
 *
 * A position is a range between two points.
 */
interface Position {
  /**
   * Place of the first character of the parsed source region.
   */
  start: Point

  /**
   * Place of the first character after the parsed source region.
   */
  end: Point
}

/**
 * Abstract unist node.
 *
 * The syntactic unit in unist syntax trees are called nodes.
 *
 * This interface is supposed to be extended.
 * If you can use {@link Literal} or {@link Parent}, you should.
 * But for example in markdown, a `thematicBreak` (`***`), is neither literal
 * nor parent, but still a node.
 */
interface Node {
  /**
   * Node type.
   */
  type: string

  /**
   * Info from the ecosystem.
   */
  data?: Data | undefined

  /**
   * Position of a node in a source document.
   *
   * Nodes that are generated (not in the original source document) must not
   * have a position.
   */
  position?: Position | undefined
}
interface Data

Info associated with nodes by the ecosystem.

This space is guaranteed to never be specified by unist or specifications implementing unist. But you can use it in utilities and plugins to store data.

This type can be augmented to register custom data. For example:

declare module 'unist' {
  interface Data {
    // `someNode.data.myId` is typed as `number | undefined`
    myId?: number | undefined
  }
}
interface Point

One place in a source file.

(property) Point.line: number

Line in a source file (1-indexed integer).

(property) Point.column: number

Column in a source file (1-indexed integer).

(property) Point.offset?: number | undefined

Character in a source file (0-indexed integer).

interface Position

Position of a node in a source document.

A position is a range between two points.

(property) Position.start: Point

Place of the first character of the parsed source region.

interface Point

One place in a source file.

(property) Position.end: Point

Place of the first character after the parsed source region.

interface Point

One place in a source file.

interface Node

Abstract unist node.

The syntactic unit in unist syntax trees are called nodes.

This interface is supposed to be extended. If you can use {@link Literal } or {@link Parent } , you should. But for example in markdown, a thematicBreak (***), is neither literal nor parent, but still a node.

(property) Node.type: string

Node type.

(property) Node.data?: Data | undefined

Info from the ecosystem.

interface Data

Info associated with nodes by the ecosystem.

This space is guaranteed to never be specified by unist or specifications implementing unist. But you can use it in utilities and plugins to store data.

This type can be augmented to register custom data. For example:

declare module 'unist' {
  interface Data {
    // `someNode.data.myId` is typed as `number | undefined`
    myId?: number | undefined
  }
}
(property) Node.position?: Position | undefined

Position of a node in a source document.

Nodes that are generated (not in the original source document) must not have a position.

interface Position

Position of a node in a source document.

A position is a range between two points.

Literal

Literal extends Node and adds a value property. For example a markdown Code node extends Literal and sets value to be a string.

/**
 * Abstract unist node that contains the smallest possible value.
 *
 * This interface is supposed to be extended.
 *
 * For example, in HTML, a `text` node is a leaf that contains text.
 */
interface Literal extends Node {
  /**
   * Plain value.
   */
  value: unknown
}
interface Literal

Abstract unist node that contains the smallest possible value.

This interface is supposed to be extended.

For example, in HTML, a text node is a leaf that contains text.

(alias) interface Node
import Node

Abstract unist node.

The syntactic unit in unist syntax trees are called nodes.

This interface is supposed to be extended. If you can use {@link Literal } or {@link Parent } , you should. But for example in markdown, a thematicBreak (***), is neither literal nor parent, but still a node.

(property) Literal.value: unknown

Plain value.

Parent

Parent extends Node and adds children. Children represent other content which is inside or a part of this node.

/**
 * Abstract unist node that contains other nodes (*children*).
 *
 * This interface is supposed to be extended.
 *
 * For example, in XML, an element is a parent of different things, such as
 * comments, text, and further elements.
 */
interface Parent extends Node {
  /**
   * List of children.
   */
  children: Node[]
}
interface Parent

Abstract unist node that contains other nodes (children).

This interface is supposed to be extended.

For example, in XML, an element is a parent of different things, such as comments, text, and further elements.

(alias) interface Node
import Node

Abstract unist node.

The syntactic unit in unist syntax trees are called nodes.

This interface is supposed to be extended. If you can use {@link Literal } or {@link Parent } , you should. But for example in markdown, a thematicBreak (***), is neither literal nor parent, but still a node.

(property) Parent.children: Node[]

List of children.

(alias) interface Node
import Node

Abstract unist node.

The syntactic unit in unist syntax trees are called nodes.

This interface is supposed to be extended. If you can use {@link Literal } or {@link Parent } , you should. But for example in markdown, a thematicBreak (***), is neither literal nor parent, but still a node.

Pulling unist into a project

Install:

npm install --save-dev @types/unist

To import the types into a TypeScript file, use:

import type {Literal, Node, Parent} from 'unist'
(alias) interface Literal
import Literal

Abstract unist node that contains the smallest possible value.

This interface is supposed to be extended.

For example, in HTML, a text node is a leaf that contains text.

(alias) interface Node
import Node

Abstract unist node.

The syntactic unit in unist syntax trees are called nodes.

This interface is supposed to be extended. If you can use {@link Literal } or {@link Parent } , you should. But for example in markdown, a thematicBreak (***), is neither literal nor parent, but still a node.

(alias) interface Parent
import Parent

Abstract unist node that contains other nodes (children).

This interface is supposed to be extended.

For example, in XML, an element is a parent of different things, such as comments, text, and further elements.

To import the types in JSDoc TypeScript, use:

/**
 * @import {Literal, Node, Parent} from 'unist'
 */

mdast (markdown)

mdast extends unist with types specific for markdown such as Code, Heading, Link, and many more. The specification includes a full list of nodes. The types are available in a types only package: @types/mdast.

Install:

npm install --save-dev @types/mdast

To import the types into a TypeScript file, use:

import type {Code, Heading, Link, Root} from 'mdast'
(alias) interface Code
import Code

Markdown code (flow) (block).

(alias) interface Heading
import Heading

Markdown heading.

(alias) interface Link
import Link

Markdown link.

(alias) interface Root
import Root

Document fragment or a whole document.

Should be used as the root of a tree and must not be used as a child.

To import the types in JSDoc TypeScript, use:

/**
 * @import {Code, Heading, Link, Root} from 'mdast'
 */

hast (HTML)

hast extends unist with types specific for HTML such as Comment, Doctype, Element, and many more. The specification includes a full list of nodes. The types are available in a types only package: @types/hast.

Install:

npm install --save-dev @types/hast

To import the types into a TypeScript file, use:

import type {Comment, Doctype, Element, Root} from 'hast'
(alias) interface Comment
import Comment

HTML comment.

(alias) interface Doctype
import Doctype

HTML document type.

(alias) interface Element
import Element

HTML element.

(alias) interface Root
import Root

Document fragment or a whole document.

Should be used as the root of a tree and must not be used as a child.

Can also be used as the value for the content field on a 'template' element.

To import the types in JSDoc TypeScript, use:

/**
 * @import {Comment, Doctype, Element, Root} from 'hast'
 */

xast (XML)

xast extends unist with types specific for XML such as Cdata, Element, Instruction, and many more. The specification includes a full list of nodes. The types are available in a types only package: @types/xast.

Install:

npm install --save-dev @types/xast

To import the types into a TypeScript file, use:

import type {Cdata, Element, Instruction, Root} from 'xast'
(alias) interface Cdata
import Cdata

XML CDATA section.

(alias) interface Element
import Element

XML element.

(alias) interface Instruction
import Instruction

XML processing instruction.

(alias) interface Root
import Root

Document fragment or a whole document.

Should be used as the root of a tree and must not be used as a child.

XML specifies that documents should have exactly one element child, so a root should have exactly one element child when representing a whole document.

To import the types in JSDoc TypeScript, use:

/**
 * @import {Cdata, Element, Instruction, Root} from 'xast'
 */

nlcst (Natural language)

nlcst extends unist with types specific for natural language such as Sentence, Word, and many more. The specification includes a full list of nodes. The types are available in a types only package: @types/nlcst.

Install:

npm install --save-dev @types/nlcst

To import the types into a TypeScript file, use:

import type {Paragraph, Sentence, Word, Root} from 'nlcst'
(alias) interface Paragraph
import Paragraph

Unit of discourse dealing with a particular point or idea.

Can contain sentence, whitespace, and source nodes.

(alias) interface Sentence
import Sentence

Grouping of grammatically linked words, that in principle tells a complete thought, although it may make little sense taken in isolation out of context.

Can be used in a paragraph node. Can contain word, symbol, punctuation, whitespace, and source nodes.

(alias) interface Word
import Word

Smallest element that may be uttered in isolation with semantic or pragmatic content.

Can be used in a sentence node. Can contain text, symbol, punctuation, and source nodes.

(alias) interface Root
import Root

Document fragment or a whole document.

Should be used as the root of a tree and must not be used as a child. Its content model is not limited, it can contain any nlcst content, with the restriction that all content must be of the same category.

To import the types in JSDoc TypeScript, use:

/**
 * @import {Paragraph, Sentence, Word, Root} from 'nlcst'
 */

Summary

Next steps