unified

Project:syntax-tree/hast-util-to-nlcst

Package:hast-util-to-nlcst@1.2.7

  1. Dependencies:8·Dependents:1
  2. hast utility to transform to nlcst
  1. util 105
  2. utility 105
  3. html 100
  4. unist 90
  5. rehype 68
  6. hast 51
  7. hast-util 40
  8. retext 33
  9. nlcst 13
  10. language 12
  11. natural 10

hast-util-to-nlcst

Build Coverage Downloads Size Sponsors Backers Chat

hast utility to transform to nlcst.

Note: You probably want to use rehype-retext.

Install

npm:

npm install hast-util-to-nlcst

Use

Say we have the following example.html:

<article>
  Implicit.
  <h1>Explicit: <strong>foo</strong>s-ball</h1>
  <pre><code class="language-foo">bar()</code></pre>
</article>

…and next to it, index.js:

var rehype = require('rehype')
var vfile = require('to-vfile')
var English = require('parse-english')
var inspect = require('unist-util-inspect')
var toNlcst = require('hast-util-to-nlcst')

var file = vfile.readSync('example.html')
var tree = rehype().parse(file)

console.log(inspect(toNlcst(tree, file, English)))

Which, when running, yields:

RootNode[2] (1:1-6:1, 0-134)
├─ ParagraphNode[3] (1:10-3:3, 9-24)
│  ├─ WhiteSpaceNode: "\n  " (1:10-2:3, 9-12)
│  ├─ SentenceNode[2] (2:3-2:12, 12-21)
│  │  ├─ WordNode[1] (2:3-2:11, 12-20)
│  │  │  └─ TextNode: "Implicit" (2:3-2:11, 12-20)
│  │  └─ PunctuationNode: "." (2:11-2:12, 20-21)
│  └─ WhiteSpaceNode: "\n  " (2:12-3:3, 21-24)
└─ ParagraphNode[1] (3:7-3:43, 28-64)
   └─ SentenceNode[4] (3:7-3:43, 28-64)
      ├─ WordNode[1] (3:7-3:15, 28-36)
      │  └─ TextNode: "Explicit" (3:7-3:15, 28-36)
      ├─ PunctuationNode: ":" (3:15-3:16, 36-37)
      ├─ WhiteSpaceNode: " " (3:16-3:17, 37-38)
      └─ WordNode[4] (3:25-3:43, 46-64)
         ├─ TextNode: "foo" (3:25-3:28, 46-49)
         ├─ TextNode: "s" (3:37-3:38, 58-59)
         ├─ PunctuationNode: "-" (3:38-3:39, 59-60)
         └─ TextNode: "ball" (3:39-3:43, 60-64)

API

toNlcst(tree, file, Parser)

Transform the given hast tree to nlcst.

Parameters
Returns

NlcstNode.

Notes
Implied paragraphs

The algorithm supports implicit and explicit paragraphs, such as:

<article>
  An implicit paragraph.
  <h1>An explicit paragraph.</h1>
</article>

Overlapping paragraphs are also supported (see the tests or the HTML spec for more info).

Ignored nodes

Some elements are ignored and their content will not be present in nlcst: <script>, <style>, <svg>, <math>, <del>.

To ignore other elements, add a data-nlcst attribute with a value of ignore:

<p>This is <span data-nlcst="ignore">hidden</span>.</p>
<p data-nlcst="ignore">Completely hidden.</p>
Source nodes

<code> elements are mapped to Source nodes in nlcst.

To mark other elements as source, add a data-nlcst attribute with a value of source:

<p>This is <span data-nlcst="source">marked as source</span>.</p>
<p data-nlcst="source">Completely marked.</p>

Security

hast-util-to-nlcst does not change the original syntax tree so there are no openings for cross-site scripting (XSS) attacks.

Contribute

See contributing.md in syntax-tree/.github for ways to get started. See support.md for ways to get help.

This project has a code of conduct. By interacting with this repository, organization, or community you agree to abide by its terms.

License

MIT © Titus Wormer