Mbox File Parser for Node.js
The `node-mbox` library (current stable version 2.0.0) provides a fast, stream-based parser for mbox email archives in Node.js environments. It is designed to efficiently process large mbox files, reportedly handling 1.5GB in about 20 seconds. The library differentiates itself by focusing specifically on the mbox file structure parsing, emitting individual email messages as `Buffer` instances, rather than attempting to parse the intricate content of the email messages themselves (a task typically handled by companion libraries like `mailparser`). Version 2.0.0 introduces a completely new API, shifting towards a more idiomatic Node.js stream approach and allowing for custom line splitting technologies. While generally robust, it notes that it is not 100% compliant with RFC 4155, which is an important consideration for strict RFC adherence. Its release cadence appears to involve significant API revisions between major versions.
Common errors
-
TypeError: Mbox is not a constructor
cause Attempting to instantiate `MboxStream` using `new MboxStream()` or incorrect named import/require.fixUse `new Mbox()` for the class-based parser, or `MboxStream(stream, options)` for the functional convenience wrapper. Ensure you are using named imports `import { Mbox, MboxStream } from 'node-mbox'` or destructuring `const {Mbox, MboxStream} = require('node-mbox')`. -
msg.split is not a function
cause Attempting string methods directly on the `msg` object, which is a `Buffer`.fixConvert `msg` to a string first using `msg.toString([encoding])` before applying string-specific methods like `split()`.
Warnings
- breaking Version 2.0.0 introduces a completely new API, shifting to be more idiomatic to Node.js streams. Existing code from v1.x will require significant refactoring.
- breaking From version 1.0.0 onwards, message data is passed around as a `Buffer` instance instead of a `String`. Direct string manipulation on `msg` will fail.
- breaking Version 2.0.0 enforces strict parsing by default. This might cause parsing errors for malformed mbox files that were previously tolerated.
- breaking Version 2.0.0 implements custom buffer encoding handling only, which may change behavior or require explicit encoding setup compared to previous versions.
- gotcha The module is not 100% conformant to RFC 4155, despite following the qmail mbox specification. This can lead to unexpected behavior or parsing issues with highly specific or non-standard mbox files.
Install
-
npm install node-mbox -
yarn add node-mbox -
pnpm add node-mbox
Imports
- Mbox
const Mbox = require('node-mbox').Mbox;import { Mbox } from 'node-mbox'; - MboxStream
const MboxStream = require('node-mbox');import { MboxStream } from 'node-mbox';
Quickstart
import { Mbox, MboxStream } from 'node-mbox';
import fs from 'fs';
import split from 'line-stream';
// 1. Pass it a filename
const mboxFromFile = new Mbox();
fs.createReadStream('test/test-4-message.mbox').pipe(mboxFromFile);
// 2. Pass it a stream and use a custom line splitter
const mailboxStream = fs.createReadStream('test/test-4-message.mbox');
const splitter = split('\n');
const mboxFromCustomStream = mailboxStream.pipe(splitter).pipe(new Mbox());
// 3. Pass it a stream and use the default line splitter (same as #2 without explicit splitter)
const mboxFromDefaultStream = MboxStream(fs.createReadStream('test/test-4-message.mbox'), { includeMboxHeader: false });
const activeMbox = mboxFromDefaultStream; // Choose one for demonstration
activeMbox.on('data', function(msg) {
// `msg` is a `Buffer` instance
console.log('Got a message (first 100 chars):', msg.toString().substring(0, 100) + '...');
});
activeMbox.on('error', function(err) {
console.log('Got an error:', err);
});
activeMbox.on('finish', function() {
console.log('Done reading mbox file.');
});
// Example for an input file. Create a dummy if it doesn't exist for running the quickstart.
// You'd typically pipe real mbox data into this.
// For quick testing, you can create a simple file:
// echo "From MAILER-DAEMON Mon Apr 18 10:00:00 2022\nSubject: Test Email 1\n\nThis is the body of test email 1.\n\nFrom MAILER-DAEMON Mon Apr 18 10:01:00 2022\nSubject: Test Email 2\n\nThis is the body of test email 2." > test/test-4-message.mbox