RDFa Streaming Parser
The `rdfa-streaming-parser` package provides a high-performance, lightweight, and 100% spec-compliant streaming parser for RDFa 1.1 data. It is currently at version 3.0.2. This library is designed to emit RDFJS-compliant quads as soon as possible, enabling the efficient parsing of documents larger than available memory. Its streaming nature leverages Node.js Transform streams, allowing for direct piping of input sources like file streams. It also implements the RDFJS Sink interface for alternative stream processing. Key differentiators include its strict adherence to the RDFa 1.1 specification, its low memory footprint due to streaming, and its compatibility with the RDFJS ecosystem for data representation. The release cadence appears stable, with major version 3 indicating significant updates from previous iterations.
Common errors
-
TypeError: RdfaParser is not a constructor
cause Attempting to invoke the module's default export as a constructor when `RdfaParser` is a named export, or incorrectly requiring the module.fixFor CommonJS, use `const { RdfaParser } = require('rdfa-streaming-parser');` or `const RdfaParser = require('rdfa-streaming-parser').RdfaParser;`. For ESM, use `import { RdfaParser } from 'rdfa-streaming-parser';`. -
Error: Missing base IRI and content type.
cause The `RdfaParser` was initialized without sufficient context (like a `baseIRI` or `contentType`) to correctly resolve relative IRIs or understand the RDFa profile of the input.fixProvide a `baseIRI` and/or `contentType` (e.g., `'text/html'`) in the `RdfaParser` constructor options. Example: `new RdfaParser({ baseIRI: 'https://example.com/', contentType: 'text/html' });`. -
SyntaxError: Cannot use import statement outside a module
cause Attempting to use ES Module `import` syntax in a file that is treated as a CommonJS module (e.g., a `.js` file without `"type": "module"` in `package.json`, or a `.cjs` file).fixEither configure your project to use ES Modules (by adding `"type": "module"` to `package.json` and using `.js` files, or renaming to `.mjs`), or use CommonJS `require` syntax: `const { RdfaParser } = require('rdfa-streaming-parser');`.
Warnings
- breaking Major version 3 introduces changes that might require updates to existing codebases, especially regarding how errors are handled or specific configuration options are interpreted. Users upgrading from v1 or v2 should consult the official changelog or release notes for precise breaking changes.
- gotcha The `RdfaParser` often requires a `baseIRI` and/or `contentType` option in its constructor for accurate parsing. Omitting these can lead to incorrect IRI resolution for relative paths or misinterpretation of the RDFa profile.
- gotcha While the library explicitly supports CommonJS `require`, modern Node.js development, especially when integrating with other ESM-first libraries, generally prefers native ES Modules `import` syntax for consistency and better tooling support.
- gotcha The parser emits RDFJS-compliant quads. Ensure that any downstream application components consuming these quads are compatible with the RDFJS data model specification, particularly concerning `DataFactory` implementations and term representations.
Install
-
npm install rdfa-streaming-parser -
yarn add rdfa-streaming-parser -
pnpm add rdfa-streaming-parser
Imports
- RdfaParser
import RdfaParser from 'rdfa-streaming-parser';
import { RdfaParser } from 'rdfa-streaming-parser'; - RdfaParser
const RdfaParser = require('rdfa-streaming-parser');const { RdfaParser } = require('rdfa-streaming-parser'); - RdfaParser
import type { RdfaParser } from 'rdfa-streaming-parser';
Quickstart
import { RdfaParser } from 'rdfa-streaming-parser';
import * as fs from 'fs'; // Node.js built-in module
async function parseRdfaFile(filePath: string, baseIri: string, contentType: string) {
const myParser = new RdfaParser({ baseIRI: baseIri, contentType: contentType });
console.log(`Parsing RDFa from ${filePath}...`);
return new Promise<void>((resolve, reject) => {
fs.createReadStream(filePath)
.pipe(myParser)
.on('data', (quad) => {
console.log(`Parsed quad: ${quad.subject.value} ${quad.predicate.value} ${quad.object.value} .`);
})
.on('error', (error) => {
console.error('An error occurred during parsing:', error);
reject(error);
})
.on('end', () => {
console.log('All triples were parsed successfully!');
resolve();
});
});
}
// Example usage:
const dummyHtmlContent = `<!DOCTYPE html>
<html>
<head prefix="foaf: http://xmlns.com/foaf/0.1/">
<title>Example Document</title>
<link rel="foaf:primaryTopic foaf:maker" href="https://www.rubensworks.net/#me" />
</head>
<body>
<h1>Hello RDFa</h1>
<p>This is an <span property="foaf:name">RDFa Example</span>.</p>
</body>
</html>`;
const tempFilePath = 'temp_rdfa_doc.html';
fs.writeFileSync(tempFilePath, dummyHtmlContent);
parseRdfaFile(tempFilePath, 'https://example.org/doc#', 'text/html')
.finally(() => {
fs.unlinkSync(tempFilePath); // Clean up the temporary file
});