{"id":4839,"library":"urlextract","title":"URLExtract","description":"URLExtract is a Python library for collecting and extracting URLs from a given text based on locating Top-Level Domains (TLDs). It is currently at version 1.9.0 and is actively maintained, with regular updates to its TLD list and ongoing Python version compatibility.","status":"active","version":"1.9.0","language":"en","source_language":"en","source_url":"https://github.com/lipoja/URLExtract","tags":["url","extract","parsing","tld","text-processing"],"install":[{"cmd":"pip install urlextract","lang":"bash","label":"Install URLExtract"}],"dependencies":[{"reason":"Required for converting links to IDNA format.","package":"idna","optional":false},{"reason":"Required for domain name validation.","package":"uritools","optional":false},{"reason":"Required for determining the user's cache directory.","package":"platformdirs","optional":false},{"reason":"Required for caching DNS results when DNS checks are enabled.","package":"dnspython","optional":false}],"imports":[{"symbol":"URLExtract","correct":"from urlextract import URLExtract"}],"quickstart":{"code":"from urlextract import URLExtract\n\nextractor = URLExtract()\ntext = \"Check out our website: example.com or find us at https://www.another-example.org/path?query=1\"\nurls = extractor.find_urls(text)\n\nprint(urls)\n# Expected output: ['example.com', 'https://www.another-example.org/path?query=1']","lang":"python","description":"Initializes the URLExtract class and uses the `find_urls` method to extract all URLs from a given text string."},"warnings":[{"fix":"Review extracted URLs in contexts where non-URL patterns might coincidentally contain TLDs. Consider using the `with_schema_only=True` parameter in `find_urls` if you only need URLs with explicit schemes (e.g., 'http://', 'https://').","message":"URLExtract's TLD-based detection can lead to 'false matches' in certain contexts, such as CSS class names (e.g., `p.bold.name` might be extracted if `.name` is a valid TLD). The library correctly identifies these as valid patterns, but they might not be the intended URLs.","severity":"gotcha","affected_versions":"<=1.9.0"},{"fix":"Ensure the application has write permissions to the default cache directory or a custom directory specified during `URLExtract` initialization. Manually update the TLD list using `extractor.update()` if necessary. Consider reporting the issue on the GitHub repository for specific edge cases.","message":"Users have reported `urlextract.cachefile.CacheFileError` or issues with custom cache directories not saving TLDs, especially in bundled applications (like PyInstaller) or read-only file systems.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Upgrade to Python 3.7 or newer. Python 3.12 support was added in v1.9.0.","message":"Support for Python 3.6 has been dropped in recent versions due to underlying dependency changes (e.g., `filelock`). Users on Python 3.6 will encounter errors.","severity":"breaking","affected_versions":">=1.5.0 (approx), explicitly in recent versions"},{"fix":"Upgrade to version 1.9.0 or later to benefit from fixes for Markdown link parsing and mixed-case hostname filtering.","message":"Older versions (prior to 1.9.0) might incorrectly parse URLs within Markdown links or have issues with filtering mixed-case hostnames, leading to incomplete or incorrect extractions.","severity":"gotcha","affected_versions":"<1.9.0"}],"env_vars":null,"last_verified":"2026-04-12T00:00:00.000Z","next_check":"2026-07-11T00:00:00.000Z"}