Express Robots.txt Middleware

raw JSON →
1.0.0 verified Thu Apr 23 auth: no javascript maintenance

express-robots-txt is an Express.js middleware designed to either serve a static `robots.txt` file or dynamically generate one based on a JavaScript object or array configuration. The package is currently at version 1.0.0, with its last known publication approximately four years ago, indicating a stable but less actively developed state, likely in maintenance mode. It provides a straightforward API to define standard `robots.txt` directives such as `User-agent`, `Disallow`, `Allow`, `Crawl-delay`, `Sitemap` (supporting multiple entries), and `Host`. This flexibility allows developers to easily control how web crawlers interact with their Express applications, preventing unwanted indexing or specifying crawl behaviors, without needing to manage a static file manually for complex scenarios.

error TypeError: app.use() requires a middleware function but got a Object
cause The `robots` middleware was called without parentheses or a valid configuration object, thus not returning a function for Express to use.
fix
Ensure app.use(robots(...)) is called with either a file path string or a configuration object/array, e.g., app.use(robots({ UserAgent: '*', Disallow: '/' })).
error My robots.txt output is empty or malformed.
cause The configuration object or array passed to `robots()` is invalid, or the specified file path does not exist/is inaccessible.
fix
Verify that the object structure follows the documented robots.txt directives (e.g., UserAgent, Disallow). If using a file, ensure the path is correct and the file is readable by the Node.js process.
gotcha For ES Module (ESM) usage, ensure your Node.js version supports `esm_conditional_exports` or higher. Older Node.js versions might not correctly resolve the `import` statement, defaulting to CommonJS behavior or throwing errors.
fix Upgrade to a recent Node.js version (v12.17.0+ for stable ESM support) or use CommonJS `require()` syntax if ESM issues persist.
gotcha The `Host` directive in `robots.txt` is not universally supported by all web crawlers and is considered deprecated by Google. Relying solely on `Host` for canonicalization is not recommended.
fix Use alternative methods for specifying canonical URLs, such as `<link rel="canonical" href="...">` in HTML headers, in addition to or instead of the `Host` directive.
npm install express-robots-txt
yarn add express-robots-txt
pnpm add express-robots-txt

This quickstart demonstrates how to set up `express-robots-txt` middleware to dynamically generate a `robots.txt` file with common directives or serve a static file.

import express from 'express';
import robots from 'express-robots-txt';
import path from 'path';

const app = express();
const port = process.env.PORT || 3000;

// Option 1: Generate robots.txt from an object configuration
app.use(robots({
  UserAgent: '*',
  Disallow: ['/admin', '/private'],
  CrawlDelay: '10',
  Sitemap: [
    'https://www.example.com/sitemap.xml',
    'https://www.example.com/sitemap-news.xml'
  ],
  Host: 'www.example.com'
}));

// Option 2 (alternative, comment out Option 1 to use): Serve a static robots.txt file
// app.use(robots(path.join(__dirname, 'public/robots.txt')));

app.get('/', (req, res) => {
  res.send('Hello World! Check /robots.txt');
});

app.listen(port, () => {
  console.log(`Server listening on port ${port}`);
  console.log(`Open http://localhost:${port} and http://localhost:${port}/robots.txt`);
});