Intro to unified
After reading this guide you will:
- understand what unified does
- get a taste of the ecosystem
- know how it can be used
- know what parts you need for your use case
- have a list of resources to continue learning or get started
Contents
Intro
unified is a friendly interface backed by an ecosystem of plugins built for creating and manipulating content. It does this by taking markdown, HTML, plain text, or other content, then turning it into structured data, thus making it available to over 100 plugins. Plugins for example do tasks such as spellchecking, linting, or minifying.
With unified you don’t manually handle syntax or parsing. Instead you typically write one line of code to chain a plugin into unified’s process.
unified itself is a rather small module that acts as an interface to unify the handling of different content formats. Around a certain format there sits an ecosystem. Such as remark for markdown. Several ecosystems exist for unified. Together with other tools and specifications they form the unified collective.
Collective
The unified collective spans like-minded organizations. These organizations have the shared goal to innovate content processing. Seamless, interchangeable, and plugable tooling is how that’s achieved.
Depending on what you want to do you reference different organizations. So let’s start off with an introduction round.
The ecosystems:
The specifications for syntax trees:
- esast — ECMAScript
- hast — HTML
- mdast — markdown
- nlcst — natural language
- unist — universal syntax tree
- xast — XML
Other building blocks:
- MDX — markdown and JSX
- micromark — small, safe, and great CommonMark (and GFM) markdown parser
- syntax-tree — low-level utilities for building plugins
- vfile — virtual file format for text processing
We’ll get to how these come together in the next section. If you are already feeling adventurous, you can go directly to Using unified or How to get started with plugins.
How it comes together
These processors, specifications, and tools come together in a three part act. The process of a processor:
- parse: whether your input is markdown, HTML, or prose, it needs to be parsed to a workable format; such a format is called a syntax tree; the specifications (for example mdast) define how such a tree looks; the processors (such as remark for mdast) are responsible for creating them
- transform: this is where the magic happens; users compose plugins and the order they run in; plugins plug into this phase and transform and inspect the format they get
- stringify: the final step is to take the (adjusted) format and stringify it to markdown, HTML, or prose (which could be different from the input format!)
unified can be used programmatically in Bun, Deno, or Node.js. With a build step or through a CDN (such as esm.sh
), it can be used in browsers as well. CLI versions, Grunt plugins, and Gulp plugins of processors also exist.
What makes unified unique is that it can switch between formats, such as markdown to HTML, in the same process. This allows for even more powerful compositions.
The following plugins bridge formats:
remark-rehype
— markdown to HTMLrehype-remark
— HTML to markdownremark-retext
— markdown to proserehype-retext
— HTML to prose
Use cases
Whenever you think about processing content — you can think of unified. It’s a powerful tool. So for some tasks, such as transforming markdown to HTML, you could use simpler tools like marked
as well. Where unified really shines is when you want to go further than one single task. For example, when you want to enforce format rules, check spelling, generate a table of contents, and (potentially) much more: that’s when to opt for unified.
A large part of MDXs success has been leveraging the unified and remark ecosystem. I was able to get a prototype working in a few hours because I didn’t have to worry about markdown parsing: remark gave it to me for free. It provided the primitives to build on.
— John Otander, author of
mdx-js/mdx
To further speak to one’s imagination, here are the more common plugins used in unified pipelines to do interesting things:
rehype-minify
— minify HTMLrehype-react
— transform to Preact, React, Vue, etcrehype-starry-night
— apply syntax highlighting to coderemark-directive
— support directivesremark-gfm
— support GFMremark-lint
— check markdown code styleremark-toc
— generate a table of contentsretext-equality
— check possibly insensitive languageretext-repeated-words
— checkfor for
repeated wordsretext-spell
— check spelling- …explore all rehype, remark, or retext plugins
Summary
- unified is a friendly interface backed by an ecosystem of plugins built for creating and manipulating content; you don’t have to worry about parsing as you have the primitives to build on
- hundreds of plugins are available
- remark is used for markdown, rehype for HTML, and retext for natural language
- unified’s plugin pipeline lets you typically write one line of code to chain a feature into the process, such as bridging formats (such as markdown to HTML)