RDFa Streaming Parser

3.0.2 · active · verified Sun Apr 19

The `rdfa-streaming-parser` package provides a high-performance, lightweight, and 100% spec-compliant streaming parser for RDFa 1.1 data. It is currently at version 3.0.2. This library is designed to emit RDFJS-compliant quads as soon as possible, enabling the efficient parsing of documents larger than available memory. Its streaming nature leverages Node.js Transform streams, allowing for direct piping of input sources like file streams. It also implements the RDFJS Sink interface for alternative stream processing. Key differentiators include its strict adherence to the RDFa 1.1 specification, its low memory footprint due to streaming, and its compatibility with the RDFJS ecosystem for data representation. The release cadence appears stable, with major version 3 indicating significant updates from previous iterations.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to parse an RDFa document from a file stream using `RdfaParser`, log the extracted RDFJS quads, and gracefully handle completion or errors. A temporary HTML file is created and then cleaned up.

import { RdfaParser } from 'rdfa-streaming-parser';
import * as fs from 'fs'; // Node.js built-in module

async function parseRdfaFile(filePath: string, baseIri: string, contentType: string) {
  const myParser = new RdfaParser({ baseIRI: baseIri, contentType: contentType });

  console.log(`Parsing RDFa from ${filePath}...`);

  return new Promise<void>((resolve, reject) => {
    fs.createReadStream(filePath)
      .pipe(myParser)
      .on('data', (quad) => {
        console.log(`Parsed quad: ${quad.subject.value} ${quad.predicate.value} ${quad.object.value} .`);
      })
      .on('error', (error) => {
        console.error('An error occurred during parsing:', error);
        reject(error);
      })
      .on('end', () => {
        console.log('All triples were parsed successfully!');
        resolve();
      });
  });
}

// Example usage:
const dummyHtmlContent = `<!DOCTYPE html>
<html>
<head prefix="foaf: http://xmlns.com/foaf/0.1/">
  <title>Example Document</title>
  <link rel="foaf:primaryTopic foaf:maker" href="https://www.rubensworks.net/#me" />
</head>
<body>
  <h1>Hello RDFa</h1>
  <p>This is an <span property="foaf:name">RDFa Example</span>.</p>
</body>
</html>`;

const tempFilePath = 'temp_rdfa_doc.html';
fs.writeFileSync(tempFilePath, dummyHtmlContent);

parseRdfaFile(tempFilePath, 'https://example.org/doc#', 'text/html')
  .finally(() => {
    fs.unlinkSync(tempFilePath); // Clean up the temporary file
  });

view raw JSON →