{"id":1506,"library":"htmldate","title":"htmldate","description":"htmldate is a Python library designed for fast and robust extraction of original and updated publication dates from URLs and web pages. It is actively maintained with frequent minor releases, often addressing bug fixes, dependency updates, and improvements to extraction heuristics.","status":"active","version":"1.9.4","language":"en","source_language":"en","source_url":"https://github.com/adbar/htmldate","tags":["date-extraction","web-scraping","html-parsing","web-content"],"install":[{"cmd":"pip install htmldate","lang":"bash","label":"Install htmldate"}],"dependencies":[{"reason":"Core dependency for parsing HTML content efficiently.","package":"lxml","optional":false},{"reason":"Used internally by `find_date` when fetching content from URLs.","package":"requests","optional":false}],"imports":[{"symbol":"find_date","correct":"from htmldate import find_date"}],"quickstart":{"code":"import requests\nfrom htmldate import find_date\n\n# Example 1: Extract date from a URL\nurl = 'https://www.example.com/news/article'\n# For real-world usage, consider handling network errors\n# Example using a placeholder URL, replace with a real one for testing\n# html_content = requests.get(url, timeout=10).text\n\n# Using a mock HTML content for reproducibility\nhtml_content = \"\"\"\n<html><head><meta property=\"article:published_time\" content=\"2023-10-26T10:00:00Z\"></head>\n<body><h1>Latest News</h1><p>Published: October 26, 2023</p></body></html>\n\"\"\"\n\ndate_from_url = find_date(url=url, html=html_content)\nprint(f\"Date extracted from URL: {date_from_url}\")\n\n# Example 2: Extract original publication date (if available and different from updated)\n# The 'originaldate' parameter hints the extractor to prioritize the earliest date.\nhtml_content_updated = \"\"\"\n<html><head><meta property=\"article:published_time\" content=\"2023-10-26T10:00:00Z\">\n<meta property=\"article:modified_time\" content=\"2024-03-15T14:30:00Z\"></head>\n<body><h1>Latest News</h1><p>Published: October 26, 2023</p><p>Last Updated: March 15, 2024</p></body></html>\n\"\"\"\noriginal_date = find_date(html=html_content_updated, originaldate=True)\nupdated_date = find_date(html=html_content_updated, originaldate=False)\n\nprint(f\"Original Date: {original_date}\")\nprint(f\"Updated Date: {updated_date}\")","lang":"python","description":"To use `htmldate`, import the `find_date` function. It can extract dates directly from a URL or from an HTML string. When providing an HTML string, it's often useful to also provide the `url` parameter for better relative path resolution and more accurate heuristics. The `originaldate` parameter allows you to prioritize the earliest found date (original publication) over potentially updated dates."},"warnings":[{"fix":"Ensure your project runs on Python 3.8 or a newer version.","message":"As of `v1.9.0`, htmldate officially focuses on and requires Python 3.8 or newer. Older Python versions are no longer supported and may encounter compatibility issues or installation failures.","severity":"breaking","affected_versions":">=1.9.0"},{"fix":"Review your code if you used `originaldate=True` in versions prior to `1.7.0` and verify the extracted dates are as expected after upgrading.","message":"The `originaldate` parameter behavior was fixed in `v1.7.0` to more accurately distinguish between original publication dates and updated dates from meta properties. If you relied on the previous behavior (pre-1.7.0) for this distinction, your results might change.","severity":"gotcha","affected_versions":"<1.7.0"},{"fix":"Be aware that date extraction results might vary for some URLs when upgrading from versions older than `1.6.0` due to refined heuristics. Evaluate critical extractions after upgrade.","message":"In `v1.6.0`, the library introduced stricter extraction patterns and replaced `lxml.html.Cleaner` for a focus on precision. This might result in `htmldate` no longer finding a date on some pages where it previously did, or extracting a different (and potentially more accurate) date.","severity":"gotcha","affected_versions":"<1.6.0"}],"env_vars":null,"last_verified":"2026-04-09T00:00:00.000Z","next_check":"2026-07-08T00:00:00.000Z"}