SAX Streaming XML Parser
sax is an evented streaming XML parser implemented in JavaScript, designed primarily for Node.js but also functional in browser environments and other CommonJS implementations. It provides a SAX-style API, emitting events for different XML constructs such as `ontext`, `onopentag`, and `onattribute` as it processes input. The current stable version, as specified, is 1.6.0, indicating a mature and stable API, although new feature releases are infrequent. Key differentiators include its lightweight nature, efficient streaming capabilities, and its explicit focus on parsing XML rather than attempting to correct malformed HTML. It avoids the complexities associated with full DOM construction, XSLT transformations, or comprehensive schema/DTD validation, making it suitable for scenarios requiring simple, fast XML event processing. It offers both a direct parser interface for string input and a Node.js stream API for handling larger files efficiently.
Common errors
-
Unknown entity: &foo;
cause Attempting to parse an XML document containing an undefined custom entity (e.g., `&foo;`) while the parser is in `strict` mode.fixEither set `strict: false` in the parser options, or manually define the entity by listening to `ondoctype` and adding it to `parser.ENTITIES`. -
Text data outside of a tag
cause Malformed XML where text appears directly outside of the root element or before the opening root tag, which is not permitted in strict XML.fixEnsure the XML document adheres to strict XML rules, having a single root element and all text nodes properly enclosed within tags. Use `strict: false` if dealing with less rigid XML or HTML-like content. -
Attribute 'foo' had no quote and no whitespace, and strict or html-mode is 'false'
cause An attribute value is not enclosed in quotes (e.g., `<tag foo=bar>`) when `strict` mode is `true`, or when `strict` is `false` but `unquotedAttributeValues` is explicitly `false`.fixEnsure all attribute values are properly quoted (e.g., `<tag foo="bar">` or `<tag foo='bar'>`). If dealing with XML that intentionally uses unquoted attributes, set `strict: false` and ensure `unquotedAttributeValues: true` in the parser options.
Warnings
- breaking The `strict` option significantly alters parsing behavior, especially regarding unquoted attribute values and unknown entities. Setting `strict: true` will cause parsing to fail on many documents that might parse successfully with `strict: false`. The default behavior for `unquotedAttributeValues` also depends on the `strict` setting, being `false` when `strict` is `true`, and `true` otherwise.
- gotcha The `sax` parser provides minimal support for XML entities. Only the five predefined XML entities (`&`, `<`, `>`, `'`, `"`) are processed automatically. Custom entities defined within DTDs are ignored unless manually processed and added to `parser.ENTITIES` by implementing custom logic within an `ondoctype` event handler.
- gotcha `sax` is a pure XML parser, not an HTML parser, and does not automatically build a Document Object Model (DOM). It expects well-formed XML and will not attempt to correct malformed HTML or provide built-in DOM manipulation capabilities. Attempting to parse severely malformed HTML in strict mode will likely result in errors.
- gotcha When using `sax.createStream()`, unhandled errors can cause the stream to stall or stop processing. The internal parser's error state within the stream must be explicitly cleared (`this._parser.error = null`) and the parser resumed (`this._parser.resume()`) within the stream's `on('error', ...)` handler to allow processing of subsequent data.
Install
-
npm install sax -
yarn add sax -
pnpm add sax
Imports
- sax
import sax from 'sax';
const sax = require('sax'); - parser
const parser = new sax.parser(strict, options);
const parser = sax.parser(strict, options);
- createStream
import { createStream } from 'sax';const saxStream = sax.createStream(strict, options);
Quickstart
const sax = require('sax');
const stream = require('stream');
// --- Direct Parser Example ---
const strictMode = true; // set to false for html-mode
const directParser = sax.parser(strictMode);
directParser.onerror = function (e) {
console.error("Direct Parser Error:", e.message);
if (!strictMode) {
// In loose mode, clear error and resume to try and continue parsing
this._parser.error = null;
this._parser.resume();
}
};
directParser.ontext = function (t) {
const trimmedText = t.trim();
if (trimmedText) console.log("Direct Text:", trimmedText);
};
directParser.onopentag = function (node) {
console.log("Direct Open Tag:", node.name, "Attributes:", JSON.stringify(node.attributes));
};
directParser.onclosetag = function () {
console.log("Direct Close Tag");
};
directParser.onend = function () {
console.log("Direct Parser End.\n");
};
console.log("--- Parsing XML directly ---");
directParser.write('<root><data name="example">Hello</data>, <world/></root>').close();
// --- Stream Parser Example ---
const streamMode = false; // loose mode for more forgiving parsing
const saxStream = sax.createStream(streamMode, {
trim: true,
normalize: true,
lowercase: true
});
saxStream.on('error', function (e) {
console.error('Stream Error:', e.message);
// Crucial for stream to continue if you want to recover after non-fatal errors
this._parser.error = null;
this._parser.resume();
});
saxStream.on('opentag', function (node) {
console.log('Stream Open Tag:', node.name, JSON.stringify(node.attributes));
});
saxStream.on('text', function (t) {
const trimmedText = t.trim();
if (trimmedText) console.log('Stream Text:', trimmedText);
});
saxStream.on('end', function () {
console.log('Stream End.');
});
console.log("--- Parsing XML via stream ---");
// Simulate a readable stream from a string buffer
const xmlContent = Buffer.from('<catalog><book id="1"><title>The Great Book</title></book><book id="2"></book></catalog>');
const readableStream = stream.Readable.from(xmlContent);
readableStream.pipe(saxStream);