TypeScript Parser Combinator
typescript-parsec is a parser combinator library designed for TypeScript, allowing developers to construct parsers quickly with a concise API. It provides utilities for lexing (tokenizing input) and parsing (defining grammar rules) using a functional, combinatory approach. As of version 0.3.4, it is still in a pre-1.0 development phase, meaning API stability might evolve, though it has been maintained by Microsoft. The library's release cadence is not strictly defined, but updates occur periodically based on needs and contributions to the underlying Microsoft research projects. It differentiates itself by being explicitly typed for TypeScript, offering robust type inference and safety for parser definitions, making it suitable for building language front-ends directly within a TypeScript codebase. Its focus is on developer experience, providing a clear and efficient way to define complex grammars with strong type guarantees, leveraging the TypeScript ecosystem effectively.
Common errors
-
TypeError: Cannot read properties of undefined (reading 'parse')
cause Attempting to call `.parse()` on a lexer or parser object that hasn't been properly initialized or assigned.fixEnsure `buildLexer` is correctly initialized with token definitions and that `rule.setPattern()` has been called for all rules before invoking `.parse()` on them. -
SyntaxError: Expected EOF
cause The parser successfully consumed a prefix of the input, but unexpected tokens remained at the end, indicating the grammar did not consume the entire input string.fixReview your grammar rules to ensure they can consume the entire expected input. Utilize `expectEOF()` appropriately at the end of the top-level parser to enforce full input consumption.
Warnings
- breaking As a pre-1.0 library (current version 0.3.4), the API is considered unstable and may introduce breaking changes in minor or even patch versions. Developers should pin exact versions and review release notes carefully for updates.
- gotcha For very large input strings or highly complex/ambiguous grammars, unoptimized parser combinators can lead to performance bottlenecks or stack overflow errors due to excessive recursion and backtracking. The library might not include automatic memoization.
- gotcha The default error messages generated by parser combinators can sometimes be generic, making it challenging to debug complex parsing failures in production. Pinpointing the exact location and nature of a syntax error might require custom error handling.
Install
-
npm install typescript-parsec -
yarn add typescript-parsec -
pnpm add typescript-parsec
Imports
- buildLexer
const { buildLexer } = require('typescript-parsec');import { buildLexer } from 'typescript-parsec'; - rule
import rule from 'typescript-parsec';
import { rule } from 'typescript-parsec'; - alt, seq, apply
import { alt, seq } from 'typescript-parsec/dist/parser';import { alt, seq, apply } from 'typescript-parsec';
Quickstart
import { buildLexer, expectEOF, expectSingleResult, rule, Token } from 'typescript-parsec';
import { alt, apply, kmid, lrec_sc, seq, str, tok } from 'typescript-parsec';
enum TokenKind {
Number,
Add,
Sub,
Mul,
Div,
LParen,
RParen,
Space
}
const lexer = buildLexer([
[true, /^\d+(\.\d+)?/g, TokenKind.Number],
[true, /^\+/g, TokenKind.Add],
[true, /^-/g, TokenKind.Sub],
[true, /^\*/g, TokenKind.Mul],
[true, /^\//g, TokenKind.Div],
[true, /^\(/g, TokenKind.LParen],
[true, /^\)/g, TokenKind.RParen],
[false, /^\s+/g, TokenKind.Space]
]);
function applyNumber(value: Token<TokenKind.Number>): number {
return +value.text;
}
function applyUnary(value: [Token<TokenKind>, number]): number {
switch (value[0].text) {
case '+': return +value[1];
case '-': return -value[1];
default: throw new Error(`Unknown unary operator: ${value[0].text}`);
}
}
function applyBinary(first: number, second: [Token<TokenKind>, number]): number {
switch (second[0].text) {
case '+': return first + second[1];
case '-': return first - second[1];
case '*': return first * second[1];
case '/': return first / second[1];
default: throw new Error(`Unknown binary operator: ${second[0].text}`);
}
}
const TERM = rule<TokenKind, number>();
const FACTOR = rule<TokenKind, number>();
const EXP = rule<TokenKind, number>();
TERM.setPattern(
alt(
apply(tok(TokenKind.Number), applyNumber),
apply(seq(alt(str('+'), str('-')), TERM), applyUnary),
kmid(str('('), EXP, str(')'))
)
);
FACTOR.setPattern(
lrec_sc(TERM, seq(alt(str('*'), str('/')), TERM), applyBinary)
);
EXP.setPattern(
lrec_sc(FACTOR, seq(alt(str('+'), str('-')), FACTOR), applyBinary)
);
function evaluate(expr: string): number {
return expectSingleResult(expectEOF(EXP.parse(lexer.parse(expr))));
}
console.log(`(1 + 2) * (3 + 4) = ${evaluate('(1 + 2) * (3 + 4)')}`);