unified

Project: unified-doc/unified-doc

Package: unified-doc-util-text-offsets@1.0.8

  1. Dependents: 0
  2. unified-doc hast utility to add text offsets to text nodes.
  1. unified 181
  2. util 145
  3. unist 132
  4. hast 74
  5. file 34
  6. content 20
  7. text 19
  8. document 10
  9. position 9
  10. doc 8

unified-doc-util-text-offsets

unified-doc hast utility to add text offsets to text nodes.


Install

npm install unified-doc-util-text-offsets

Use

Given a hast tree parsed from some HTML content:

import textOffsets from 'unified-doc-util-text-offsets';

// html: '<blockquote><strong>some</strong>\ncontent</blockquote>'
const hast = {
  type: 'root',
  children: [
    {
      type: 'element',
      tagName: 'blockquote',
      children: [
        {
          type: 'element',
          tagName: 'strong',
          children: [
            {
              type: 'text',
              value: 'some',
            },
          ],
        },
        {
          type: 'text',
          value: '\ncontent',
        },
      ],
    },
  ],
};

expect(textOffsets(hast)).toEqual({
  type: 'root',
  children: [
    {
      type: 'element',
      tagName: 'blockquote',
      children: [
        {
          type: 'element',
          tagName: 'strong',
          children: [
            {
              type: 'text',
              value: 'some',
              data: {
                textOffset: { start: 0, end: 4 },
              },
            },
          ],
        },
        {
          type: 'text',
          value: '\ncontent',
          data: {
            textOffset: { start: 4, end: 12 },
          },
        },
      ],
    },
  ]
});

API

textOffsets(hast)

Interface

function textOffsets(hast: Hast): Hast;

Accepts a hast tree and adds textOffset data to text nodes. Returns a new tree.

A TextOffset for a text node tracks the start and end offset of its text value relative to the textContent representation of the provided hast tree. The textContent representation of a hast tree is the concatenation of all text node values under the tree. The following pseudocode helps visualize this behavior:

const html = '<blockquote><strong>some</strong>\ncontent</blockquote>';
const textContent = 'some\ncontent';
const textNodes = ['some', '\ncontent'];
const textOffsets = [
  { start: 0, end: 4 }, // "[some]\ncontent"
  { start: 4, end: 12 }, // "some[\ncontent]"
];

// textOffset data mentioned above attached to text nodes
const withTextOffsets = textOffsets(hast);
interface TextOffset = {
  /** start offset of the text node value relative to the `textContent` of the `hast` tree */
  start: number;
  /** end offset of the text node value relative to the `textContent` of the `hast` tree */
  end: number
}