unified

Project: syntax-tree/mdast-util-to-nlcst

Package: mdast-util-to-nlcst@5.2.1

  1. Dependents: 0
  2. mdast utility to transform to nlcst
  1. markdown 151
  2. util 147
  3. utility 143
  4. unist 133
  5. mdast 88
  6. mdast-util 31
  7. nlcst 15
  8. language 12
  9. nlcst-util 9
  10. natural 9

mdast-util-to-nlcst

Build Coverage Downloads Size Sponsors Backers Chat

mdast utility to transform to nlcst.

Contents

What is this?

This package is a utility that takes an mdast (markdown) syntax tree as input and turns it into nlcst (natural language).

When should I use this?

This project is useful when you want to deal with ASTs and inspect the natural language inside markdown. Unfortunately, there is no way yet to apply changes to the nlcst back into mdast.

The hast utility hast-util-to-nlcst does the same but uses an HTML tree as input.

The remark plugin remark-retext wraps this utility to do the same at a higher-level (easier) abstraction.

Install

This package is ESM only. In Node.js (version 12.20+, 14.14+, or 16.0+), install with npm:

npm install mdast-util-to-nlcst

In Deno with esm.sh:

import {toNlcst} from "https://esm.sh/mdast-util-to-nlcst@5"

In browsers with esm.sh:

<script type="module">
  import {toNlcst} from "https://esm.sh/mdast-util-to-nlcst@5?bundle"
</script>

Use

Say we have the following example.md:

Some *foo*sball.

…and next to it a module example.js:

import {read} from 'to-vfile'
import {ParseEnglish} from 'parse-english'
import {inspect} from 'unist-util-inspect'
import {fromMarkdown} from 'mdast-util-from-markdown'
import {toNlcst} from 'mdast-util-to-nlcst'

const file = await read('example.md')
const mdast = fromMarkdown(file)
const nlcst = toNlcst(mdast, file, ParseEnglish)

console.log(inspect(nlcst))

Yields:

RootNode[1] (1:1-1:17, 0-16)
└─0 ParagraphNode[1] (1:1-1:17, 0-16)
    └─0 SentenceNode[4] (1:1-1:17, 0-16)
        ├─0 WordNode[1] (1:1-1:5, 0-4)
        │   └─0 TextNode "Some" (1:1-1:5, 0-4)
        ├─1 WhiteSpaceNode " " (1:5-1:6, 4-5)
        ├─2 WordNode[2] (1:7-1:16, 6-15)
        │   ├─0 TextNode "foo" (1:7-1:10, 6-9)
        │   └─1 TextNode "sball" (1:11-1:16, 10-15)
        └─3 PunctuationNode "." (1:16-1:17, 15-16)

API

This package exports the identifier toNlcst. There is no default export.

toNlcst(tree, file, Parser[, options])

mdast utility to transform to nlcst.

👉 Note: tree must have positional info, file must be a vfile corresponding to tree, and Parser must be a parser such as parse-english, parse-dutch, or parse-latin.

options

Configuration (optional).

options.ignore

List of mdast node types to ignore (Array<string>, optional). The types 'table', 'tableRow', and 'tableCell' are always ignored.

Show example

Say we have the following file example.md:

A paragraph.

> A paragraph in a block quote.

…and if we now transform with ignore: ['blockquote'], we get:

RootNode[2] (1:1-3:1, 0-14)
├─0 ParagraphNode[1] (1:1-1:13, 0-12)
│   └─0 SentenceNode[4] (1:1-1:13, 0-12)
│       ├─0 WordNode[1] (1:1-1:2, 0-1)
│       │   └─0 TextNode "A" (1:1-1:2, 0-1)
│       ├─1 WhiteSpaceNode " " (1:2-1:3, 1-2)
│       ├─2 WordNode[1] (1:3-1:12, 2-11)
│       │   └─0 TextNode "paragraph" (1:3-1:12, 2-11)
│       └─3 PunctuationNode "." (1:12-1:13, 11-12)
└─1 WhiteSpaceNode "\n\n" (1:13-3:1, 12-14)
options.source

List of mdast node types to mark as nlcst source nodes (Array<string>, optional). The type 'inlineCode' is always marked as source.

Show example

Say we have the following file example.md:

A paragraph.

> A paragraph in a block quote.

…and if we now transform with source: ['blockquote'], we get:

RootNode[3] (1:1-3:32, 0-45)
├─0 ParagraphNode[1] (1:1-1:13, 0-12)
│   └─0 SentenceNode[4] (1:1-1:13, 0-12)
│       ├─0 WordNode[1] (1:1-1:2, 0-1)
│       │   └─0 TextNode "A" (1:1-1:2, 0-1)
│       ├─1 WhiteSpaceNode " " (1:2-1:3, 1-2)
│       ├─2 WordNode[1] (1:3-1:12, 2-11)
│       │   └─0 TextNode "paragraph" (1:3-1:12, 2-11)
│       └─3 PunctuationNode "." (1:12-1:13, 11-12)
├─1 WhiteSpaceNode "\n\n" (1:13-3:1, 12-14)
└─2 ParagraphNode[1] (3:1-3:32, 14-45)
    └─0 SentenceNode[1] (3:1-3:32, 14-45)
        └─0 SourceNode "> A paragraph in a block quote." (3:1-3:32, 14-45)
Returns

NlcstNode.

Types

This package is fully typed with TypeScript. It exports the types Options, ParserConstructor, and ParserInstance.

Compatibility

Projects maintained by the unified collective are compatible with all maintained versions of Node.js. As of now, that is Node.js 12.20+, 14.14+, and 16.0+. Our projects sometimes work with older versions, but this is not guaranteed.

Security

Use of mdast-util-to-nlcst does not involve hast so there are no openings for cross-site scripting (XSS) attacks.

Contribute

See contributing.md in syntax-tree/.github for ways to get started. See support.md for ways to get help.

This project has a code of conduct. By interacting with this repository, organization, or community you agree to abide by its terms.

License

MIT © Titus Wormer