nearley Parser Toolkit
nearley is a comprehensive parsing toolkit for JavaScript that enables developers to define and parse custom languages. It utilizes the Earley parsing algorithm, making it capable of handling any context-free grammar, including notoriously difficult cases like left recursion, which often trip up other parser generators like PEG.js or Jison. The current stable version is 2.20.1, with releases synchronized to Zenodo for academic citation. Key differentiators include its streaming capabilities, graceful error handling, support for ambiguous grammars by providing all possible parsings, and compatibility with various lexers (e.g., moo). It also provides a robust toolchain for testing, railroad diagrams, and fuzzers, and works seamlessly in both Node.js and browser environments.
Common errors
-
Error: nearley: unexpected token
cause The input stream contains a token that does not fit any of the defined grammar rules at the current parsing state.fixReview your input string and the corresponding grammar rules. Ensure your lexer is producing the expected tokens and that your grammar handles all possible sequences of these tokens. Use `nearley-test` for debugging. -
Error: Could not parse Earley grammar: Unexpected EOF
cause The grammar definition file (`.ne`) is syntactically incomplete or malformed, often missing a closing brace, parenthesis, or statement terminator.fixCarefully review your `.ne` file for syntax errors. The `nearleyc` compiler will often provide a more specific line number for the error. -
TypeError: Cannot read properties of undefined (reading 'length') at Parser.feed
cause This usually indicates that the `Parser` was initialized without a valid, compiled grammar object, or the grammar object itself is malformed.fixEnsure `nearley.Grammar.fromCompiled(myGrammar)` is correctly called with the output from your `.ne` file's compilation, and that `myGrammar` is indeed the object exported by the compiled grammar file.
Warnings
- gotcha When defining grammars, ensure all lexer rules (like string literals or regexes) are unambiguous and exhaustive to avoid parsing errors or unexpected tokenization. The grammar expects a continuous stream of tokens.
- gotcha nearley uses the Earley algorithm which can return multiple parse trees for ambiguous grammars. If your grammar is intentionally ambiguous, `parser.results` will be an array of all possible parses. If it's unintended, this indicates an issue in your grammar design.
- gotcha Grammars defined in `.ne` files must be pre-compiled into JavaScript using `nearleyc` before they can be used at runtime. Attempting to load a `.ne` file directly will result in a file loading error.
- gotcha The `process.env.DEBUG` environment variable can significantly change nearley's output, enabling detailed debugging logs. While useful for development, ensure it's not set in production environments to avoid performance overhead and verbose logging.
Install
-
npm install nearley -
yarn add nearley -
pnpm add nearley
Imports
- Grammar
const Grammar = require('nearley').Grammar;import { Grammar } from 'nearley'; - Parser
const Parser = require('nearley').Parser;import { Parser } from 'nearley'; - nearleyc
import nearleyc from 'nearley/lib/compile';
npx nearleyc mygrammar.ne -o mygrammar.js
Quickstart
const nearley = require('nearley');
const fs = require('fs');
const path = require('path');
// 1. Define your grammar in a .ne file or directly as a JS object
// For this example, let's create a simple grammar dynamically
const grammarSource = `
@{% function id(x) {return x[0];} %}
start -> expression:id
expression -> number:id (operation number):id* {% function(data) {
let result = data[0];
for (let i = 0; i < data[1].length; i++) {
const op = data[1][i][0];
const num = data[1][i][1];
if (op === '+') result += num;
if (op === '-') result -= num;
}
return result;
} %}
number -> [0-9]:+ {% function(data) { return parseInt(data.join('')); } %}
operation -> '+' | '-'
`;
// In a real scenario, you'd compile this via CLI:
// npx nearleyc mygrammar.ne -o mygrammar.js
// Then load it: const grammar = require('./mygrammar.js');
// For a quickstart, we'll compile it in-memory (requires nearley compiler)
const compile = require('nearley/lib/compile');
const generate = require('nearley/lib/generate');
const nearleyGrammar = require('nearley/lib/nearley-language-grammar');
const compiledGrammar = compile(nearley.Grammar.fromCompiled(nearleyGrammar), grammarSource);
const grammar = nearley.Grammar.fromCompiled(eval(generate(compiledGrammar, {}))); // Eval is risky, avoid in production
// 2. Create a Parser instance
const parser = new nearley.Parser(grammar);
// 3. Feed input to the parser
try {
parser.feed('10+20-5');
console.log('Parsed results:', parser.results); // Should output [25]
parser.feed('3*4'); // This will fail as '*' is not in grammar
} catch (error) {
console.error('Parsing error:', error.message);
}
// Example of parsing another valid input
const parser2 = new nearley.Parser(grammar);
parser2.feed('1+2+3-1');
console.log('Another result:', parser2.results); // Should output [5]