{"id":16749,"library":"tweet-harvest","title":"Tweet Harvest (Twitter Crawler)","description":"Tweet Harvest is an active command-line interface (CLI) tool designed for scraping tweets from Twitter search results. Utilizing Playwright, it automates browser interactions to retrieve data based on specified keywords and date ranges, exporting the results to CSV or XLSX formats. The current stable version is 2.7.1, with frequent minor releases addressing bug fixes, performance improvements, and new export functionalities (e.g., XLSX export in v2.7.0). A key differentiator is its reliance on a valid Twitter `auth_token` cookie for authentication, as Twitter prohibits unauthenticated search. While primarily a CLI, it also exposes programmatic APIs for integration into Node.js applications, offering functions to initiate the scraping process and process tweet data. Its continuous updates ensure compatibility with Twitter's evolving interface and provide enhanced data quality features like ISO 8601 timestamps.","status":"active","version":"2.7.1","language":"javascript","source_language":"en","source_url":"https://github.com/helmisatria/tweet-harvest","tags":["javascript"],"install":[{"cmd":"npm install tweet-harvest","lang":"bash","label":"npm"},{"cmd":"yarn add tweet-harvest","lang":"bash","label":"yarn"},{"cmd":"pnpm add tweet-harvest","lang":"bash","label":"pnpm"}],"dependencies":[{"reason":"Core dependency for browser automation to scrape Twitter. Requires `playwright install` to download browser binaries.","package":"playwright","optional":false}],"imports":[{"note":"Primary programmatic function to initiate the tweet scraping process. Designed for ESM usage. CommonJS `require` is generally not supported without transpilation.","wrong":"const { harvest } = require('tweet-harvest');","symbol":"harvest","correct":"import { harvest } from 'tweet-harvest';"},{"note":"TypeScript type definition for the configuration object passed to the `harvest` function. Essential for type-safe programmatic usage.","symbol":"Options","correct":"import { type Options } from 'tweet-harvest';"},{"note":"Utility function for processing and cleaning individual tweet data. Primarily designed for ESM.","wrong":"const cleanTweet = require('tweet-harvest').cleanTweet;","symbol":"cleanTweet","correct":"import { cleanTweet } from 'tweet-harvest';"}],"quickstart":{"code":"import { harvest } from 'tweet-harvest';\nimport type { Options } from 'tweet-harvest';\n\nconst twitterAuthToken = process.env.TWITTER_AUTH_TOKEN ?? ''; // Get this from your browser cookies\n\nif (!twitterAuthToken) {\n  console.error('TWITTER_AUTH_TOKEN environment variable is not set. Please provide a valid Twitter auth token from your browser cookies.');\n  process.exit(1);\n}\n\nconst options: Options = {\n  keyword: 'AI ethics',\n  from: '2023-01-01',\n  to: '2023-12-31',\n  filename: 'ai-ethics-tweets',\n  limit: 100, // Limit to 100 tweets for this example\n  exportFormat: 'csv',\n  auth_token: twitterAuthToken,\n  withReplies: false,\n  withImages: false,\n  withVideos: false\n};\n\nasync function runHarvest() {\n  console.log('Starting tweet harvest...');\n  try {\n    await harvest(options);\n    console.log(`Successfully harvested tweets to ${options.filename}.csv`);\n  } catch (error) {\n    console.error('Error during tweet harvest:', error);\n    if (error instanceof Error && error.message.includes('auth_token')) {\n        console.error('Ensure your TWITTER_AUTH_TOKEN is valid and up-to-date.');\n    }\n  }\n}\n\nrunHarvest();\n","lang":"typescript","description":"Demonstrates programmatic usage of `tweet-harvest` to scrape tweets with specified keywords and date ranges, requiring a Twitter authentication token."},"warnings":[{"fix":"Use the full `--to` flag instead of the `-t` short option, or consult CLI help for current short options.","message":"The short option for the `--to` flag (`-t`) was removed due to ambiguity with other short options. Users relying on `-t` for the 'to date' will need to update their scripts.","severity":"breaking","affected_versions":">=2.7.1"},{"fix":"Regularly obtain an updated `auth_token` by logging into Twitter in your browser and extracting the `auth_token` cookie. Pass it via the `--auth-token` CLI option or `auth_token` option in programmatic usage.","message":"Tweet Harvest requires a valid Twitter `auth_token` cookie for authentication. This token can expire or become invalid, leading to failed scrapes. Twitter actively prohibits unauthenticated search, making this token essential.","severity":"gotcha","affected_versions":">=2.0.0"},{"fix":"Update CSV parsing logic to expect comma-separated values, or check for an option to explicitly set the delimiter if one is introduced in later versions.","message":"The default CSV delimiter was changed from `;` to `,`. This will affect any scripts or tools parsing the output that expected the semicolon delimiter.","severity":"breaking","affected_versions":">=2.4.2"},{"fix":"Ensure you are using the latest version of `tweet-harvest` to benefit from upstream fixes for Twitter's evolving platform. If issues persist, your `auth_token` might be outdated or Twitter may have implemented new anti-scraping measures.","message":"Changes in Twitter's cookie domains or internal structure can cause authentication or scraping failures. Version 2.6.1 specifically addressed a fix for 'cookie domain changes'.","severity":"gotcha","affected_versions":">=2.6.1"},{"fix":"Review CSV output and update any downstream processing scripts to be resilient to changes in header order or to specifically match headers by name rather than by index.","message":"Consistency of CSV headers order was fixed, and support for Gephi format was added. Existing scripts that rely on a specific, potentially inconsistent, header order might break or yield incorrect data.","severity":"breaking","affected_versions":">=2.5.3"}],"env_vars":null,"last_verified":"2026-04-22T00:00:00.000Z","next_check":"2026-07-21T00:00:00.000Z","problems":[{"fix":"Run `npx playwright install` in your project directory to download the necessary browser binaries for Playwright.","cause":"The Playwright browser binaries (e.g., Chromium) required by `tweet-harvest` have not been downloaded.","error":"Error: Playwright browser has not been installed."},{"fix":"Log into Twitter in a browser, extract a fresh `auth_token` cookie, and update your configuration or environment variable. Ensure there are no leading/trailing spaces or other characters.","cause":"The provided Twitter `auth_token` is either expired, invalid, or incorrectly formatted, preventing successful authentication with Twitter.","error":"Error: Auth token is not valid. Please make sure you enter a valid Twitter auth token."},{"fix":"Check your internet connection, ensure no other processes are interfering with Playwright, and try running in a headful mode (if available via options) to debug browser launch issues. Ensure your Node.js version is compatible with Playwright.","cause":"This often indicates that Playwright failed to launch the browser or navigate to the Twitter page, potentially due to network issues, an unsupported environment, or conflicting browser processes.","error":"Error: Cannot read properties of undefined (reading 'page')"},{"fix":"Check the `tweet-harvest` changelog for recent versions to identify changes in export format. Adjust your CSV parsing logic to account for new delimiters, header order, or additional columns.","cause":"Breaking changes in CSV header order (v2.5.3), delimiter (v2.4.2), or added fields can alter the structure of the output CSV.","error":"CSV output does not match expected format / Missing columns."}],"ecosystem":"npm"}