Streaming Markdown Parser
The `markdown-parser` library provides a robust and fully typed parser for Markdown content, adhering strictly to the CommonMark specification and including full support for GitHub Flavored Markdown (GFM) tables. Currently at version 0.1.1, its primary differentiation is its advanced streaming and incremental parsing capabilities, which are especially critical for applications consuming continuously arriving content, such as outputs from large language models or real-time communication systems. Unlike conventional Markdown parsers that necessitate complete input before generating an Abstract Syntax Tree (AST), `markdown-parser` can process text in chunks, emitting finalized block nodes as they become stable while internally managing the state of incomplete structures. This design allows for immediate display and dynamic manipulation of Markdown content without waiting for the entire stream to conclude. The library is engineered to produce a structured, typed AST, making it highly amenable to programmatic interaction, rendering, and analysis. While a formal release cadence is not explicitly stated, the early version number suggests active and iterative development.
Common errors
-
ReferenceError: require is not defined
cause Attempting to use the CommonJS `require()` syntax to import `markdown-parser` in an environment configured for ES modules (e.g., a Node.js project with `"type": "module"` in `package.json` or direct execution of `.mjs` files).fixSwitch to ES module `import` syntax: `import { MarkdownParser } from 'markdown-parser';`. -
Some expected Markdown blocks are missing from the output when using streaming mode.
cause When `stream: true` is enabled, the parser only emits blocks that have been fully closed and are stable. If the input ends prematurely or the stream is not explicitly finalized, any incomplete or buffered blocks will not be returned.fixAfter providing all available input chunks, make a final call to `parser.parse('', { stream: false })` to force the parser to finalize and emit any remaining buffered blocks. This signals the end of the stream. -
TypeError: parser.parse is not a function
cause This error typically occurs when attempting to call the `parse` method directly on the imported module object rather than on an instance of the `MarkdownParser` class.fixEnsure you correctly instantiate the parser class before calling its methods: `const parser = new MarkdownParser();` then `parser.parse(...)`.
Warnings
- gotcha When utilizing the streaming mode, it's crucial to understand that link reference definitions (e.g., `[label]: url`) might not resolve immediately if their corresponding definitions arrive in subsequent input chunks. The parser processes content sequentially, and any link references that precede their definitions will remain unresolved until the definitions are fed into the stream and the relevant blocks are finalized.
- breaking As this package is in its very early stages (v0.1.x), the API is considered unstable. Breaking changes, including alterations to node structures, parser options, or method signatures, are highly likely to occur in minor or even patch versions as the API matures towards a stable v1.0.0 release. Updates may require code adjustments.
- gotcha The parser's `parse` method maintains internal state when `stream: true` is used. This means that a single `MarkdownParser` instance should be used for a continuous stream of input. Starting a new, unrelated stream requires a new `MarkdownParser` instance or careful management of the state, though the latter is not directly exposed.
Install
-
npm install markdown-parser -
yarn add markdown-parser -
pnpm add markdown-parser
Imports
- MarkdownParser
const { MarkdownParser } = require('markdown-parser');import { MarkdownParser } from 'markdown-parser'; - BlockNode
import type { BlockNode } from 'markdown-parser'; - InlineNode
import type { InlineNode } from 'markdown-parser';
Quickstart
import { MarkdownParser } from "markdown-parser";
import type { BlockNode } from "markdown-parser";
const parser = new MarkdownParser();
// Example 1: Parse complete markdown in one go
const completeNodes: BlockNode[] = parser.parse("# Hello World\nThis is a paragraph.");
console.log('Complete Parse Output:', JSON.stringify(completeNodes, null, 2));
// Expected: [
// { type: "heading", level: 1, children: [{ type: "text", text: "Hello World" }] },
// { type: "paragraph", children: [{ type: "text", text: "This is a paragraph." }] }
// ]
// Example 2: Parse with streaming mode for incremental content
console.log('\n--- Streaming Parse ---');
let streamOutput1: BlockNode[] = parser.parse("# Hello World\nThis", { stream: true });
console.log('Stream Part 1:', JSON.stringify(streamOutput1, null, 2));
// Expected: [
// { type: "heading", level: 1, children: [{ type: "text", text: "Hello World" }] }
// ] (paragraph is still open)
let streamOutput2: BlockNode[] = parser.parse(" is a paragraph\n\nThis is another paragraph.", { stream: true });
console.log('Stream Part 2:', JSON.stringify(streamOutput2, null, 2));
// Expected: [
// { type: "paragraph", children: [{ type: "text", text: "This is a paragraph." }] }
// ] (second paragraph still open)
let streamOutput3: BlockNode[] = parser.parse("", { stream: false }); // Finalize the stream
console.log('Stream Finalize:', JSON.stringify(streamOutput3, null, 2));
// Expected: [
// { type: "paragraph", children: [{ type: "text", text: "This is another paragraph." }] }
// ]
console.log('\n--- End Streaming Parse ---');