RDF/JS CSV on the Web Parser

1.1.0 · active · verified Tue Apr 21

rdf-parser-csvw is a JavaScript library designed to parse CSV (Comma Separated Values) data according to the CSV on the Web (CSVW) W3C recommendation, converting it into RDF/JS Quads. It leverages the RDF/JS Stream interface, allowing for efficient, asynchronous processing of large CSV files by consuming a stream of strings and emitting a stream of parsed RDF quads. The library's current stable version is 1.1.0, with releases typically following a feature-driven cadence rather than strict timeboxes. A key differentiator is its strict adherence to the RDF/JS specification for data factories and stream interfaces, ensuring broad compatibility within the RDF/JS ecosystem. It requires explicit CSVW metadata (as an RDF/JS Dataset) and a base IRI for proper conversion. Options include specifying a custom RDF/JS data factory, an alternative timezone for date/time parsing, and error-handling preferences such as `relaxColumnCount` to ignore column count mismatches or `skipLinesWithError` for debugging noisy datasets, though the latter is advised against for production use.

Common errors

Warnings

Install

Imports

Quickstart

Demonstrates how to instantiate the `Parser` class, construct a minimal CSVW metadata RDF/JS Dataset, and parse a CSV string into RDF quads using streams.

import { Readable } from 'stream';
import { Parser } from 'rdf-parser-csvw';
import rdf from 'rdf-ext'; // A common RDF/JS implementation for DataFactory and Dataset

async function parseCsvw() {
  const csvString = `Name,Age\nAlice,30\nBob,25\nCharlie,35`;
  const baseIRI = 'http://example.org/data/';

  // Construct a minimal CSVW metadata Dataset using rdf-ext
  const metadataDataset = rdf.dataset();
  const ex = rdf.namedNode(baseIRI);
  const csvw = rdf.namedNode('http://www.w3.org/ns/csvw#');
  const rdfType = rdf.namedNode('http://www.w3.org/1999/02/22-rdf-syntax-ns#type');

  const tableGroup = rdf.blankNode();
  const table = rdf.blankNode();
  const column1 = rdf.blankNode();
  const column2 = rdf.blankNode();

  metadataDataset.add(rdf.quad(ex.file, rdfType, csvw.TableGroup));
  metadataDataset.add(rdf.quad(ex.file, csvw.table, table));
  metadataDataset.add(rdf.quad(table, rdfType, csvw.Table));
  metadataDataset.add(rdf.quad(table, csvw.url, rdf.namedNode(`${baseIRI}data.csv`)));

  // Define columns based on CSV headers
  metadataDataset.add(rdf.quad(table, csvw.column, column1));
  metadataDataset.add(rdf.quad(column1, csvw.name, rdf.literal('Name')));
  metadataDataset.add(rdf.quad(column1, csvw.datatype, csvw.string));

  metadataDataset.add(rdf.quad(table, csvw.column, column2));
  metadataDataset.add(rdf.quad(column2, csvw.name, rdf.literal('Age')));
  metadataDataset.add(rdf.quad(column2, csvw.datatype, csvw.integer));

  // Instantiate the parser with required options
  const parser = new Parser({
    metadata: metadataDataset,
    baseIRI: baseIRI,
    factory: rdf // Use rdf-ext's data factory
  });

  // Create a readable stream from the CSV string
  const csvStream = Readable.from([csvString]);

  console.log('Starting CSVW parsing...');
  // Import the CSV stream and get a stream of RDF quads
  const quadStream = parser.import(csvStream);

  // Consume and log the parsed quads
  for await (const quad of quadStream) {
    console.log(quad.toString());
  }
  console.log('Finished parsing.');
}

parseCsvw().catch(console.error);

view raw JSON →