{"library":"robots-parser","title":"Robots.txt Parser","description":"`robots-parser` is a JavaScript/TypeScript library designed to parse `robots.txt` files according to the draft specification. As of version 3.0.1, it provides robust support for directives such as `User-agent`, `Allow`, `Disallow`, `Sitemap`, `Crawl-delay`, and `Host`, including advanced features like wildcard (`*`) and end-of-line (`$`) matching for paths. The library maintains an active development status, with recent releases addressing critical bug fixes (e.g., HTTPS URL port handling in 3.0.1) and improving compatibility (e.g., using global URL object in 3.0.0, adding TypeScript definitions in 2.4.0). Its primary differentiator is its adherence to the specification and comprehensive feature set for accurately determining URL crawlability for different user agents, making it a reliable choice for web crawlers and SEO tools.","language":"javascript","status":"active","last_verified":"Sun Apr 19","install":{"commands":["npm install robots-parser"],"cli":null},"imports":["import robotsParser from 'robots-parser';","const robotsParser = require('robots-parser');","import type { RobotsParser } from 'robots-parser';"],"auth":{"required":false,"env_vars":[]},"quickstart":{"code":"import robotsParser from 'robots-parser';\n\nasync function runRobotsParserExample() {\n  const baseUrl = 'http://www.example.com';\n  const robotsTxtContent = `\nUser-agent: *\nDisallow: /dir/\nDisallow: /test.html\nAllow: /dir/test.html\nAllow: /test.html\nCrawl-delay: 1\nSitemap: ${baseUrl}/sitemap.xml\nHost: example.com\n  `.trim();\n\n  // In a real application, you'd typically fetch this content from a URL:\n  // const response = await fetch(`${baseUrl}/robots.txt`);\n  // const fetchedRobotsTxtContent = await response.text();\n\n  const robots = robotsParser(`${baseUrl}/robots.txt`, robotsTxtContent);\n\n  console.log('Is http://www.example.com/test.html allowed for Sams-Bot/1.0?', robots.isAllowed('http://www.example.com/test.html', 'Sams-Bot/1.0'));\n  console.log('Is http://www.example.com/dir/test.html allowed for Sams-Bot/1.0?', robots.isAllowed('http://www.example.com/dir/test.html', 'Sams-Bot/1.0'));\n  console.log('Is http://www.example.com/dir/test2.html disallowed for Sams-Bot/1.0?', robots.isDisallowed('http://www.example.com/dir/test2.html', 'Sams-Bot/1.0'));\n  console.log('Crawl delay for Sams-Bot/1.0:', robots.getCrawlDelay('Sams-Bot/1.0'));\n  console.log('Sitemaps:', robots.getSitemaps());\n  console.log('Preferred Host:', robots.getPreferredHost());\n\n  // Demonstrating undefined for invalid URLs (i.e., not matching the base URL)\n  console.log('Is an out-of-scope URL allowed?', robots.isAllowed('http://www.anotherdomain.com/path', 'Sams-Bot/1.0'));\n}\n\nrunRobotsParserExample();","lang":"typescript","description":"Demonstrates parsing a `robots.txt` string, checking URL crawlability for different user-agents, retrieving sitemaps, crawl delays, and preferred host from the parsed rules. It also illustrates how to handle out-of-scope URLs.","tag":null,"tag_description":null,"last_tested":null,"results":[]},"compatibility":null}