RDF/JS CSV on the Web Parser
rdf-parser-csvw is a JavaScript library designed to parse CSV (Comma Separated Values) data according to the CSV on the Web (CSVW) W3C recommendation, converting it into RDF/JS Quads. It leverages the RDF/JS Stream interface, allowing for efficient, asynchronous processing of large CSV files by consuming a stream of strings and emitting a stream of parsed RDF quads. The library's current stable version is 1.1.0, with releases typically following a feature-driven cadence rather than strict timeboxes. A key differentiator is its strict adherence to the RDF/JS specification for data factories and stream interfaces, ensuring broad compatibility within the RDF/JS ecosystem. It requires explicit CSVW metadata (as an RDF/JS Dataset) and a base IRI for proper conversion. Options include specifying a custom RDF/JS data factory, an alternative timezone for date/time parsing, and error-handling preferences such as `relaxColumnCount` to ignore column count mismatches or `skipLinesWithError` for debugging noisy datasets, though the latter is advised against for production use.
Common errors
-
Error: The metadata option is required
cause The `metadata` option was not provided during parser instantiation or the `.import` call.fixPass an RDF/JS `Dataset` object containing CSVW metadata to the `metadata` option. -
Error: The baseIRI option is required
cause The `baseIRI` option was not provided, which is essential for resolving relative IRIs.fixProvide a string value for `baseIRI` in the parser options, e.g., `new Parser({ baseIRI: 'http://example.org/' })`. -
Error: Column count mismatch in row X
cause A row in the CSV stream has a different number of columns than expected, and `relaxColumnCount` is not enabled.fixEither fix the malformed CSV data, or set `relaxColumnCount: true` in the parser options to ignore these errors. -
TypeError: parser.import is not a function
cause The `Parser` class was not correctly instantiated, or `import` was called on the class itself instead of an instance.fixEnsure you create an instance of the `Parser` class using `new Parser(options)` before calling `.import()`.
Warnings
- gotcha The `metadata` option is strictly required in the parser constructor or `.import` method. It must be an RDF/JS Dataset representing the CSV on the Web metadata for your CSV file.
- gotcha The `baseIRI` option is strictly required to create Named Nodes from relative IRIs within the CSV data.
- gotcha The `skipLinesWithError` option is primarily for debugging purposes and should not be used in production environments, as it can lead to silent data loss or inconsistent graph generation from malformed CSV lines.
- gotcha By default, the parser will throw an error if a row's column count does not match the headers. Use `relaxColumnCount` if you expect and want to tolerate such discrepancies.
Install
-
npm install rdf-parser-csvw -
yarn add rdf-parser-csvw -
pnpm add rdf-parser-csvw
Imports
- Parser
import Parser from 'rdf-parser-csvw'
import { Parser } from 'rdf-parser-csvw' - Parser (CommonJS)
const Parser = require('rdf-parser-csvw')const { Parser } = require('rdf-parser-csvw') - Parser (Type Import)
import type { Parser } from 'rdf-parser-csvw'
Quickstart
import { Readable } from 'stream';
import { Parser } from 'rdf-parser-csvw';
import rdf from 'rdf-ext'; // A common RDF/JS implementation for DataFactory and Dataset
async function parseCsvw() {
const csvString = `Name,Age\nAlice,30\nBob,25\nCharlie,35`;
const baseIRI = 'http://example.org/data/';
// Construct a minimal CSVW metadata Dataset using rdf-ext
const metadataDataset = rdf.dataset();
const ex = rdf.namedNode(baseIRI);
const csvw = rdf.namedNode('http://www.w3.org/ns/csvw#');
const rdfType = rdf.namedNode('http://www.w3.org/1999/02/22-rdf-syntax-ns#type');
const tableGroup = rdf.blankNode();
const table = rdf.blankNode();
const column1 = rdf.blankNode();
const column2 = rdf.blankNode();
metadataDataset.add(rdf.quad(ex.file, rdfType, csvw.TableGroup));
metadataDataset.add(rdf.quad(ex.file, csvw.table, table));
metadataDataset.add(rdf.quad(table, rdfType, csvw.Table));
metadataDataset.add(rdf.quad(table, csvw.url, rdf.namedNode(`${baseIRI}data.csv`)));
// Define columns based on CSV headers
metadataDataset.add(rdf.quad(table, csvw.column, column1));
metadataDataset.add(rdf.quad(column1, csvw.name, rdf.literal('Name')));
metadataDataset.add(rdf.quad(column1, csvw.datatype, csvw.string));
metadataDataset.add(rdf.quad(table, csvw.column, column2));
metadataDataset.add(rdf.quad(column2, csvw.name, rdf.literal('Age')));
metadataDataset.add(rdf.quad(column2, csvw.datatype, csvw.integer));
// Instantiate the parser with required options
const parser = new Parser({
metadata: metadataDataset,
baseIRI: baseIRI,
factory: rdf // Use rdf-ext's data factory
});
// Create a readable stream from the CSV string
const csvStream = Readable.from([csvString]);
console.log('Starting CSVW parsing...');
// Import the CSV stream and get a stream of RDF quads
const quadStream = parser.import(csvStream);
// Consume and log the parsed quads
for await (const quad of quadStream) {
console.log(quad.toString());
}
console.log('Finished parsing.');
}
parseCsvw().catch(console.error);