How to work with syntax trees in TypeScript
This guide will introduce you to using unist and unified with TypeScript.
Contents
Basics
All unified syntax trees are based off unist. The core types are available in a types only package: @types/unist
. The main type is Node
. Everything else extends it. Literal
and Parent
are more specific types which also extend Node
.
The types provided by unist are abstract interfaces. Often, you will instead use more practical interfaces depending on what language you’re working with. Each language supported by unified, like markdown, HTML, and XML, has its own syntax tree standard which extends unist
.
Let’s take a look at these.
unist
Node
Node
is the syntactic unit of syntax trees. Each node extends Node
(sometimes through Literal
or Parent
) and sets type
to a string literal. The type field tells us what kind of content the node is. This field uniquely identifies a kind of content. in TypeScript that’s called a discriminated union. For example in markdown (mdast) Node
will be extended to make things such as a Heading
or Link
, which respectively use a type
field of 'heading'
and 'link'
.
A node can optionally include a Data
interface at the data
field. This is an object (dictionary) that stores extra metadata which is not standard to the node but defined by the ecosystem (utilities and plugins).
When a syntax tree is parsed from a file, it includes positional information: a Position
interface at the position
field. This describes where the node occurred in the source file.
/**
* Info associated with nodes by the ecosystem.
*
* This space is guaranteed to never be specified by unist or specifications
* implementing unist.
* But you can use it in utilities and plugins to store data.
*
* This type can be augmented to register custom data.
* For example:
*
* ```ts
* declare module 'unist' {
* interface Data {
* // `someNode.data.myId` is typed as `number | undefined`
* myId?: number | undefined
* }
* }
* ```
*/
interface Data {}
/**
* One place in a source file.
*/
interface Point {
/**
* Line in a source file (1-indexed integer).
*/
line: number
/**
* Column in a source file (1-indexed integer).
*/
column: number
/**
* Character in a source file (0-indexed integer).
*/
offset?: number | undefined
}
/**
* Position of a node in a source document.
*
* A position is a range between two points.
*/
interface Position {
/**
* Place of the first character of the parsed source region.
*/
start: Point
/**
* Place of the first character after the parsed source region.
*/
end: Point
}
/**
* Abstract unist node.
*
* The syntactic unit in unist syntax trees are called nodes.
*
* This interface is supposed to be extended.
* If you can use {@link Literal} or {@link Parent}, you should.
* But for example in markdown, a `thematicBreak` (`***`), is neither literal
* nor parent, but still a node.
*/
interface Node {
/**
* Node type.
*/
type: string
/**
* Info from the ecosystem.
*/
data?: Data | undefined
/**
* Position of a node in a source document.
*
* Nodes that are generated (not in the original source document) must not
* have a position.
*/
position?: Position | undefined
}
interface Data
Info associated with nodes by the ecosystem.
This space is guaranteed to never be specified by unist or specifications implementing unist. But you can use it in utilities and plugins to store data.
This type can be augmented to register custom data. For example:
declare module 'unist' {
interface Data {
// `someNode.data.myId` is typed as `number | undefined`
myId?: number | undefined
}
}
interface Point
One place in a source file.
(property) Point.line: number
Line in a source file (1-indexed integer).
(property) Point.column: number
Column in a source file (1-indexed integer).
(property) Point.offset?: number | undefined
Character in a source file (0-indexed integer).
interface Position
Position of a node in a source document.
A position is a range between two points.
(property) Position.start: Point
Place of the first character of the parsed source region.
interface Point
One place in a source file.
(property) Position.end: Point
Place of the first character after the parsed source region.
interface Point
One place in a source file.
interface Node
Abstract unist node.
The syntactic unit in unist syntax trees are called nodes.
This interface is supposed to be extended. If you can use {@link Literal } or {@link Parent } , you should. But for example in markdown, a thematicBreak
(***
), is neither literal nor parent, but still a node.
(property) Node.type: string
Node type.
(property) Node.data?: Data | undefined
Info from the ecosystem.
interface Data
Info associated with nodes by the ecosystem.
This space is guaranteed to never be specified by unist or specifications implementing unist. But you can use it in utilities and plugins to store data.
This type can be augmented to register custom data. For example:
declare module 'unist' {
interface Data {
// `someNode.data.myId` is typed as `number | undefined`
myId?: number | undefined
}
}
(property) Node.position?: Position | undefined
Position of a node in a source document.
Nodes that are generated (not in the original source document) must not have a position.
interface Position
Position of a node in a source document.
A position is a range between two points.
Literal
Literal
extends Node
and adds a value
property. For example a markdown Code
node extends Literal
and sets value
to be a string
.
/**
* Abstract unist node that contains the smallest possible value.
*
* This interface is supposed to be extended.
*
* For example, in HTML, a `text` node is a leaf that contains text.
*/
interface Literal extends Node {
/**
* Plain value.
*/
value: unknown
}
interface Literal
Abstract unist node that contains the smallest possible value.
This interface is supposed to be extended.
For example, in HTML, a text
node is a leaf that contains text.
(alias) interface Node
import Node
Abstract unist node.
The syntactic unit in unist syntax trees are called nodes.
This interface is supposed to be extended. If you can use {@link Literal } or {@link Parent } , you should. But for example in markdown, a thematicBreak
(***
), is neither literal nor parent, but still a node.
(property) Literal.value: unknown
Plain value.
Parent
Parent
extends Node
and adds children
. Children represent other content which is inside or a part of this node.
/**
* Abstract unist node that contains other nodes (*children*).
*
* This interface is supposed to be extended.
*
* For example, in XML, an element is a parent of different things, such as
* comments, text, and further elements.
*/
interface Parent extends Node {
/**
* List of children.
*/
children: Node[]
}
interface Parent
Abstract unist node that contains other nodes (children).
This interface is supposed to be extended.
For example, in XML, an element is a parent of different things, such as comments, text, and further elements.
(alias) interface Node
import Node
Abstract unist node.
The syntactic unit in unist syntax trees are called nodes.
This interface is supposed to be extended. If you can use {@link Literal } or {@link Parent } , you should. But for example in markdown, a thematicBreak
(***
), is neither literal nor parent, but still a node.
(property) Parent.children: Node[]
List of children.
(alias) interface Node
import Node
Abstract unist node.
The syntactic unit in unist syntax trees are called nodes.
This interface is supposed to be extended. If you can use {@link Literal } or {@link Parent } , you should. But for example in markdown, a thematicBreak
(***
), is neither literal nor parent, but still a node.
Pulling unist into a project
Install:
npm install --save-dev @types/unist
To import the types into a TypeScript file, use:
import type {Literal, Node, Parent} from 'unist'
(alias) interface Literal
import Literal
Abstract unist node that contains the smallest possible value.
This interface is supposed to be extended.
For example, in HTML, a text
node is a leaf that contains text.
(alias) interface Node
import Node
Abstract unist node.
The syntactic unit in unist syntax trees are called nodes.
This interface is supposed to be extended. If you can use {@link Literal } or {@link Parent } , you should. But for example in markdown, a thematicBreak
(***
), is neither literal nor parent, but still a node.
(alias) interface Parent
import Parent
Abstract unist node that contains other nodes (children).
This interface is supposed to be extended.
For example, in XML, an element is a parent of different things, such as comments, text, and further elements.
To import the types in JSDoc TypeScript, use:
/**
* @import {Literal, Node, Parent} from 'unist'
*/
mdast (markdown)
mdast extends unist with types specific for markdown such as Code
, Heading
, Link
, and many more. The specification includes a full list of nodes. The types are available in a types only package: @types/mdast
.
Install:
npm install --save-dev @types/mdast
To import the types into a TypeScript file, use:
import type {Code, Heading, Link, Root} from 'mdast'
(alias) interface Code
import Code
Markdown code (flow) (block).
(alias) interface Heading
import Heading
Markdown heading.
(alias) interface Link
import Link
Markdown link.
(alias) interface Root
import Root
Document fragment or a whole document.
Should be used as the root of a tree and must not be used as a child.
To import the types in JSDoc TypeScript, use:
/**
* @import {Code, Heading, Link, Root} from 'mdast'
*/
hast (HTML)
hast extends unist with types specific for HTML such as Comment
, Doctype
, Element
, and many more. The specification includes a full list of nodes. The types are available in a types only package: @types/hast
.
Install:
npm install --save-dev @types/hast
To import the types into a TypeScript file, use:
import type {Comment, Doctype, Element, Root} from 'hast'
(alias) interface Comment
import Comment
HTML comment.
(alias) interface Doctype
import Doctype
HTML document type.
(alias) interface Element
import Element
HTML element.
(alias) interface Root
import Root
Document fragment or a whole document.
Should be used as the root of a tree and must not be used as a child.
Can also be used as the value for the content field on a 'template'
element.
To import the types in JSDoc TypeScript, use:
/**
* @import {Comment, Doctype, Element, Root} from 'hast'
*/
xast (XML)
xast extends unist with types specific for XML such as Cdata
, Element
, Instruction
, and many more. The specification includes a full list of nodes. The types are available in a types only package: @types/xast
.
Install:
npm install --save-dev @types/xast
To import the types into a TypeScript file, use:
import type {Cdata, Element, Instruction, Root} from 'xast'
(alias) interface Cdata
import Cdata
XML CDATA section.
(alias) interface Element
import Element
XML element.
(alias) interface Instruction
import Instruction
XML processing instruction.
(alias) interface Root
import Root
Document fragment or a whole document.
Should be used as the root of a tree and must not be used as a child.
XML specifies that documents should have exactly one element child, so a root should have exactly one element child when representing a whole document.
To import the types in JSDoc TypeScript, use:
/**
* @import {Cdata, Element, Instruction, Root} from 'xast'
*/
nlcst (Natural language)
nlcst extends unist with types specific for natural language such as Sentence
, Word
, and many more. The specification includes a full list of nodes. The types are available in a types only package: @types/nlcst
.
Install:
npm install --save-dev @types/nlcst
To import the types into a TypeScript file, use:
import type {Paragraph, Sentence, Word, Root} from 'nlcst'
(alias) interface Paragraph
import Paragraph
Unit of discourse dealing with a particular point or idea.
Can contain sentence, whitespace, and source nodes.
(alias) interface Sentence
import Sentence
Grouping of grammatically linked words, that in principle tells a complete thought, although it may make little sense taken in isolation out of context.
Can be used in a paragraph node. Can contain word, symbol, punctuation, whitespace, and source nodes.
(alias) interface Word
import Word
Smallest element that may be uttered in isolation with semantic or pragmatic content.
Can be used in a sentence node. Can contain text, symbol, punctuation, and source nodes.
(alias) interface Root
import Root
Document fragment or a whole document.
Should be used as the root of a tree and must not be used as a child. Its content model is not limited, it can contain any nlcst content, with the restriction that all content must be of the same category.
To import the types in JSDoc TypeScript, use:
/**
* @import {Paragraph, Sentence, Word, Root} from 'nlcst'
*/
Summary
- unified provides types for each language’s syntax tree
- These types can be import into TypeScript projects and into JSDoc projects