Regenerate: Unicode-Aware Regex Generator
Regenerate is a specialized JavaScript library designed to create regular expressions from a given set of Unicode symbols or code points. It intelligently handles the complexities of Unicode in JavaScript, particularly with astral symbols (those outside the Basic Multilingual Plane), by generating ES5-compatible patterns that correctly match these characters, typically using surrogate pairs. The library provides a fluent, chainable API to add, remove, and manage code points and ranges, allowing developers to precisely define character sets for their regexes. Currently at version 1.4.2, the package appears to be in a maintenance or stable state, having seen its last significant update several years ago, indicating a mature and feature-complete solution for Unicode-aware regex generation. It remains a valuable tool for ensuring cross-browser and historical JavaScript engine compatibility when dealing with advanced Unicode characters in regular expressions.
Common errors
-
TypeError: regenerate is not a function
cause Attempting to call `regenerate` directly after a named ESM import (e.g., `import { regenerate } from 'regenerate';`) or after a misconfigured CJS `require` call in some environments.fixEnsure you are importing the package correctly. For CommonJS, use `const regenerate = require('regenerate');`. For ESM (especially in TypeScript), use `import * as regenerate from 'regenerate';` to capture the CJS default export as a namespace object, then call `regenerate.default()` or `regenerate()` if your transpiler unwraps it. -
TypeError: Cannot read properties of undefined (reading 'add')
cause This error occurs when trying to call a method like `add()` directly on the `regenerate` export (e.g., `regenerate.add(...)`) instead of on an instance returned by calling `regenerate()` (e.g., `regenerate().add(...)`).fixAlways call the `regenerate()` function first to create a new set instance, then chain its methods: `regenerate().add(0x60).addRange(0x6A, 0x6B);` -
Error [ERR_REQUIRE_ESM]: require() of ES Module [path/to/node_modules/regenerate/index.js] from [your_file.js] not supported.
cause This error typically indicates that `regenerate` has been inadvertently treated or bundled as an ESM-only package by a tool or environment, and a CJS `require()` call is attempting to load it. While `regenerate` is CJS, this can happen with build configurations.fixVerify your project's `package.json` for `"type": "module"` and adjust import/export strategies. Ensure your bundler or Node.js environment is correctly configured for CJS-ESM interop. If possible, consider changing the consuming file to use ESM `import` syntax or ensure `esModuleInterop` is enabled in TypeScript.
Warnings
- breaking The v0.6.0 release (from 2013) involved a complete internal rewrite for performance and memory efficiency, based on a new data structure. While the public API aimed to remain compatible, such a significant overhaul could have introduced subtle behavioral changes or regressions for certain edge cases compared to prior 0.x versions.
- gotcha Regenerate is an older package primarily written for CommonJS (CJS) environments. Attempting to use modern ECMAScript Module (ESM) `import` syntax (`import regenerate from 'regenerate';`) directly in a pure ESM Node.js project or a bundler that strictly enforces ESM can lead to import errors or `TypeError: regenerate_1.default is not a function` at runtime.
- gotcha The package is no longer actively maintained. While stable and functional for its intended purpose, it will not receive updates for new JavaScript features, performance improvements from newer JS engines, or bug fixes for newly discovered edge cases or security vulnerabilities (though less critical for this type of utility).
- gotcha Regenerate specifically generates ES5-compatible regexes that correctly handle astral symbols by converting them to surrogate pairs (e.g., `\uD834\uDF06`). If you manually add the `u` flag (Unicode flag) to a regex generated by Regenerate, it might lead to unexpected matching behavior because the `u` flag changes how JavaScript's regex engine interprets character classes and escape sequences.
Install
-
npm install regenerate -
yarn add regenerate -
pnpm add regenerate
Imports
- regenerate
import regenerate from 'regenerate';
const regenerate = require('regenerate'); - regenerate
import { regenerate } from 'regenerate';import * as regenerate from 'regenerate';
- regenerate().add
regenerate.add(0x1D306);
regenerate().add(0x1D306);
Quickstart
const regenerate = require('regenerate');
// Create a set and add/remove code points and ranges
const unicodeSet = regenerate()
.addRange(0x60, 0x69) // Add U+0060 (`) to U+0069 (i)
.remove(0x62, 0x64) // Remove U+0062 (b) and U+0064 (d)
.add(0x1D306); // Add U+1D306 (a rare astral symbol)
// Get the array of code points
console.log('Code points:', unicodeSet.valueOf());
// Expected: [96, 97, 99, 101, 102, 103, 104, 105, 119558]
// Get the ES5-compatible regex string
const regexString = unicodeSet.toString();
console.log('Regex string:', regexString);
// Expected: '[`ace-i]|\uD834\uDF06'
// Get the RegExp object
const regex = unicodeSet.toRegExp();
console.log('RegExp object:', regex);
// Expected: /[`ace-i]|\uD834\uDF06/
// Example with direct arguments to regenerate
const directSet = regenerate(0x1D306, 'A', '©', 0x2603);
console.log('Direct set regex string:', directSet.toString());
// Expected: '[A\xA9\u2603]|\uD834\uDF06'