Robot Directives Parser

0.4.0 · maintenance · verified Wed Apr 22

The `robot-directives` package (current stable version 0.4.0) provides a focused utility for parsing and interpreting robot directives found within HTML `<meta name="robots">` tags and `X-Robots-Tag` HTTP headers. It allows developers to programmatically determine a crawler's allowed or disallowed actions based on these instructions, such as `noindex`, `nofollow`, `noarchive`, and `unavailable_after`. The library handles the cascading logic of multiple directives, user-agent specific rules, and resolves conflicts based on a `restrictive` default (mimicking Googlebot's behavior). It explicitly differentiates itself by not handling the underlying HTML parsing, requiring users to extract meta tag content themselves. While not on a rapid release cycle, the package offers a stable API for its specialized parsing tasks, including a comprehensive set of constants for all standard robot directives and static methods for general utility.

Common errors

Warnings

Install

Imports

Quickstart

Demonstrates how to instantiate `RobotDirectives`, add directives from HTTP headers and HTML meta tags, and then query their state using `is()` with specific user agents and time overrides. It also shows the usage of the static `isBot()` helper.

const RobotDirectives = require('robot-directives');

// Instantiate with default options
const robots = new RobotDirectives({
  // Example: Override default userAgent if needed
  // userAgent: 'Googlebot/2.1 (web crawler) (+http://www.google.com/bot.html)',
  // Example: Override current time for 'unavailable_after' testing
  // currentTime: () => new Date('jan 1 2025').getTime()
});

// Add directives from an X-Robots-Tag HTTP header
robots.header('googlebot: noindex, nosnippet');
// Add directives from HTML meta tags
robots.meta('robots', 'noarchive,nofollow');
robots.meta('bingbot', 'unavailable_after: 1-Jan-3000 00:00:00 EST');

// Check specific directives
console.log('Is nofollow?', robots.is(RobotDirectives.NOFOLLOW));
// Expected: true

console.log('Is noindex for Googlebot?', robots.is(RobotDirectives.NOINDEX, { userAgent: 'Googlebot' }));
// Expected: true

console.log('Is noarchive?', robots.is(RobotDirectives.NOARCHIVE));
// Expected: true

// Check for a directive that is not present
console.log('Is index?', robots.is(RobotDirectives.INDEX));
// Expected: false

// Check if 'unavailable_after' has passed (example: assuming current time is after 3000)
console.log('Is noindex for Bingbot after 3000?', robots.is(RobotDirectives.NOINDEX, {
  currentTime: () => new Date('Jan 2 3000').getTime(), // Set current time past the unavailable_after date
  userAgent: 'Bingbot/2.0'
}));
// Expected: true

// Use static helper function
console.log('Is "googlebot" a recognized bot name?', RobotDirectives.isBot('googlebot'));
// Expected: true
console.log('Is "mycustombot" a recognized bot name?', RobotDirectives.isBot('mycustombot'));
// Expected: false

view raw JSON →