{"id":17357,"library":"robot-directives","title":"Robot Directives Parser","description":"The `robot-directives` package (current stable version 0.4.0) provides a focused utility for parsing and interpreting robot directives found within HTML `<meta name=\"robots\">` tags and `X-Robots-Tag` HTTP headers. It allows developers to programmatically determine a crawler's allowed or disallowed actions based on these instructions, such as `noindex`, `nofollow`, `noarchive`, and `unavailable_after`. The library handles the cascading logic of multiple directives, user-agent specific rules, and resolves conflicts based on a `restrictive` default (mimicking Googlebot's behavior). It explicitly differentiates itself by not handling the underlying HTML parsing, requiring users to extract meta tag content themselves. While not on a rapid release cycle, the package offers a stable API for its specialized parsing tasks, including a comprehensive set of constants for all standard robot directives and static methods for general utility.","status":"maintenance","version":"0.4.0","language":"javascript","source_language":"en","source_url":"https://github.com/stevenvachon/robot-directives","tags":["javascript","crawlers","header","html","http","meta","metadata","nofollow","noindex"],"install":[{"cmd":"npm install robot-directives","lang":"bash","label":"npm"},{"cmd":"yarn add robot-directives","lang":"bash","label":"yarn"},{"cmd":"pnpm add robot-directives","lang":"bash","label":"pnpm"}],"dependencies":[],"imports":[{"note":"This package is primarily a CommonJS module. For ESM environments, use `import RobotDirectives from 'robot-directives';` for the default export.","wrong":"import { RobotDirectives } from 'robot-directives';","symbol":"RobotDirectives","correct":"const RobotDirectives = require('robot-directives');"},{"note":"Directive constants are static properties of the `RobotDirectives` class and must be accessed via the class constructor.","wrong":"import { NOINDEX } from 'robot-directives';","symbol":"RobotDirectives.NOINDEX","correct":"const { NOINDEX } = RobotDirectives;"},{"note":"The `isBot` function is a static method of the `RobotDirectives` class, not a named export.","wrong":"import { isBot } from 'robot-directives';","symbol":"RobotDirectives.isBot","correct":"const { isBot } = RobotDirectives;"}],"quickstart":{"code":"const RobotDirectives = require('robot-directives');\n\n// Instantiate with default options\nconst robots = new RobotDirectives({\n  // Example: Override default userAgent if needed\n  // userAgent: 'Googlebot/2.1 (web crawler) (+http://www.google.com/bot.html)',\n  // Example: Override current time for 'unavailable_after' testing\n  // currentTime: () => new Date('jan 1 2025').getTime()\n});\n\n// Add directives from an X-Robots-Tag HTTP header\nrobots.header('googlebot: noindex, nosnippet');\n// Add directives from HTML meta tags\nrobots.meta('robots', 'noarchive,nofollow');\nrobots.meta('bingbot', 'unavailable_after: 1-Jan-3000 00:00:00 EST');\n\n// Check specific directives\nconsole.log('Is nofollow?', robots.is(RobotDirectives.NOFOLLOW));\n// Expected: true\n\nconsole.log('Is noindex for Googlebot?', robots.is(RobotDirectives.NOINDEX, { userAgent: 'Googlebot' }));\n// Expected: true\n\nconsole.log('Is noarchive?', robots.is(RobotDirectives.NOARCHIVE));\n// Expected: true\n\n// Check for a directive that is not present\nconsole.log('Is index?', robots.is(RobotDirectives.INDEX));\n// Expected: false\n\n// Check if 'unavailable_after' has passed (example: assuming current time is after 3000)\nconsole.log('Is noindex for Bingbot after 3000?', robots.is(RobotDirectives.NOINDEX, {\n  currentTime: () => new Date('Jan 2 3000').getTime(), // Set current time past the unavailable_after date\n  userAgent: 'Bingbot/2.0'\n}));\n// Expected: true\n\n// Use static helper function\nconsole.log('Is \"googlebot\" a recognized bot name?', RobotDirectives.isBot('googlebot'));\n// Expected: true\nconsole.log('Is \"mycustombot\" a recognized bot name?', RobotDirectives.isBot('mycustombot'));\n// Expected: false","lang":"javascript","description":"Demonstrates how to instantiate `RobotDirectives`, add directives from HTTP headers and HTML meta tags, and then query their state using `is()` with specific user agents and time overrides. It also shows the usage of the static `isBot()` helper."},"warnings":[{"fix":"If you require 'all' to be an absolute override, set `allIsReadonly: false` in the constructor options. Be aware this deviates from common crawler behavior.","message":"The `allIsReadonly` option defaults to `true`. This means declaring an `'all'` directive will not overwrite other directives, which might be counter-intuitive if expecting 'all' to be absolute. Most search crawlers behave this way, but it's important to be aware of.","severity":"gotcha","affected_versions":">=0.1.0"},{"fix":"If your target crawler doesn't follow restrictive conflict resolution, set `restrictive: false` in the constructor options. You may then need to implement custom conflict resolution logic.","message":"The `restrictive` option defaults to `true`, resolving directive conflicts (e.g., `noindex,index`) by selecting the most restrictive value (`noindex`). While this mimics Googlebot, other crawlers might resolve conflicts differently.","severity":"gotcha","affected_versions":">=0.1.0"},{"fix":"Ensure `currentTime` in the constructor options or `is()` method options accurately reflects the desired current time, especially when dealing with dates and time zones.","message":"The `unavailable_after` directive's evaluation depends on the `currentTime` option. If `currentTime` is not correctly configured (e.g., time zone issues, static date), the `unavailable_after` directive might not be interpreted as expired when it should be, or vice-versa.","severity":"gotcha","affected_versions":">=0.1.0"},{"fix":"Use a dedicated HTML parser (e.g., `cheerio`, `jsdom`) to extract the meta tag content before passing it to `robot-directives`.","message":"This library explicitly states it is NOT responsible for parsing HTML. You must manually extract the `content` attribute from `<meta name=\"robots\">` tags and pass it to the `meta()` method.","severity":"gotcha","affected_versions":">=0.1.0"}],"env_vars":null,"last_verified":"2026-04-22T00:00:00.000Z","next_check":"2026-07-21T00:00:00.000Z","problems":[{"fix":"For CommonJS, use `const RobotDirectives = require('robot-directives');`. For ESM, use `import RobotDirectives from 'robot-directives';`.","cause":"Attempting to use `import { RobotDirectives } from 'robot-directives';` or other incorrect destructuring for a CommonJS default export.","error":"TypeError: RobotDirectives is not a constructor"},{"fix":"If in an ESM file (e.g., `type: module` in `package.json`), use `import RobotDirectives from 'robot-directives';` instead of `const RobotDirectives = require('robot-directives');`.","cause":"Trying to use `require()` in an ES Module context without proper setup or bundler configuration.","error":"ReferenceError: require is not defined"},{"fix":"Review the constructor options for `allIsReadonly` (default `true`) and `restrictive` (default `true`). Adjust them in the `new RobotDirectives(options)` call if the default behavior doesn't match your expectations.","cause":"Misunderstanding the default behavior of `allIsReadonly` or `restrictive` options.","error":"Directive 'X' not working as expected (e.g., 'all' isn't overriding other directives)"}],"ecosystem":"npm","meta_description":null}