{"id":7309,"library":"internetarchive","title":"Internet Archive Python Library","description":"internetarchive is a Python interface to archive.org, providing both a command-line interface (CLI) and a Python API. It allows programmatic access to search, download, upload, and interact with various Internet Archive services. The library is actively maintained, with version 5.8.0 being the latest stable release, and new versions released periodically to add features, improve performance, and address bugs or security vulnerabilities.","status":"active","version":"5.8.0","language":"en","source_language":"en","source_url":"https://github.com/jjjake/internetarchive","tags":["internet archive","web archiving","data download","data upload","metadata","cli","archive.org"],"install":[{"cmd":"pip install internetarchive","lang":"bash","label":"Standard installation"},{"cmd":"pip install \"internetarchive[speedups]\"","lang":"bash","label":"With optional speedup dependencies (ujson, gevent)"}],"dependencies":[{"reason":"Requires Python 3.9 or newer","package":"python","optional":false},{"reason":"Faster JSON parsing for 'speedups' extra","package":"ujson","optional":true},{"reason":"Concurrent downloads for 'speedups' extra","package":"gevent","optional":true}],"imports":[{"symbol":"get_item","correct":"from internetarchive import get_item"},{"symbol":"search_items","correct":"from internetarchive import search_items"},{"symbol":"upload","correct":"from internetarchive import upload"},{"symbol":"download","correct":"from internetarchive import download"},{"note":"The configure function is typically imported directly or used via the 'ia configure' CLI command. Direct import from the top-level package is the standard approach.","wrong":"import internetarchive.configure","symbol":"configure","correct":"from internetarchive import configure"}],"quickstart":{"code":"import os\nfrom internetarchive import search_items, get_item, upload\nimport tempfile\n\n# --- Authentication ---\n# Set your IA S3 keys as environment variables for uploads and metadata modification.\n# You can generate them at https://archive.org/account/s3.php\n# For programmatic access, it's recommended to set IA_ACCESS_KEY and IA_SECRET_KEY\n# as environment variables.\n# Example: export IA_ACCESS_KEY='YOUR_ACCESS_KEY' \n#          export IA_SECRET_KEY='YOUR_SECRET_KEY'\n\n# --- 1. Search for items ---\nprint(\"Searching for items tagged 'NASA'...\")\nsearch_results = search_items('subject:NASA')\nfor i, result in enumerate(search_results.iter_as_results()):\n    if i >= 3: # Limit to 3 results for brevity\n        break\n    print(f\"  - Identifier: {result['identifier']}, Title: {result.get('title')}\")\n\n# --- 2. Download a file from an item ---\n# Using an example item known to exist with publicly downloadable files\nprint(\"\\nAttempting to download a file from 'nasa_images_1960s'...\")\ntry:\n    item_to_download = get_item('nasa_images_1960s') # Use a stable public item\n    # Try to find an image file, otherwise download the first available file\n    files = item_to_download.get_files(formats=['JPEG', 'PNG', 'image/jpeg'])\n    if files:\n        file_to_download = files[0]\n        print(f\"Downloading {file_to_download.name}...\")\n        # Use tempfile for a safe, temporary download location\n        with tempfile.TemporaryDirectory() as tmpdir:\n            downloaded_path = file_to_download.download(tmpdir)\n            print(f\"Downloaded to: {downloaded_path}\")\n    else:\n        print(\"No suitable files found to download from 'nasa_images_1960s'.\")\nexcept Exception as e:\n    print(f\"Error during download: {e}\")\n\n\n# --- 3. Upload a dummy file ---\n# Requires IA_ACCESS_KEY and IA_SECRET_KEY to be set as environment variables\naccess_key = os.environ.get('IA_ACCESS_KEY', 'YOUR_ACCESS_KEY')\nsecret_key = os.environ.get('IA_SECRET_KEY', 'YOUR_SECRET_KEY')\n\nif access_key == 'YOUR_ACCESS_KEY' or secret_key == 'YOUR_SECRET_KEY':\n    print(\"\\nSkipping upload example: IA_ACCESS_KEY or IA_SECRET_KEY not set.\")\n    print(\"Please set environment variables or use 'ia configure' to enable uploads.\")\nelse:\n    print(\"\\nAttempting to upload a dummy file...\")\n    temp_file_name = \"my_dummy_file.txt\"\n    temp_file_content = \"This is a test upload from the internetarchive Python library.\"\n    identifier = \"my_unique_test_item_12345\" # Replace with a truly unique identifier\n\n    with tempfile.NamedTemporaryFile(mode='w', delete=False, suffix='.txt') as f:\n        f.write(temp_file_content)\n        temp_file_path = f.name\n\n    metadata = {\n        'title': f'My Test Item {identifier}',\n        'description': 'A dummy item uploaded via Python library quickstart.',\n        'mediatype': 'data', # Required\n        'collection': 'test_collection' # Replace with a collection you have write access to\n    }\n\n    try:\n        print(f\"Uploading {temp_file_path} to {identifier}...\")\n        r = upload(identifier, files=[temp_file_path], metadata=metadata)\n        print(f\"Upload successful! Status: {r[0].status_code}\")\n        print(f\"View item at: https://archive.org/details/{identifier}\")\n    except Exception as e:\n        print(f\"Error during upload: {e}\")\n    finally:\n        os.remove(temp_file_path)\n","lang":"python","description":"This quickstart demonstrates how to search for items, download files, and upload new content to the Internet Archive using the `internetarchive` Python library. It highlights the use of `search_items` to find content, `get_item` and `File.download` for downloading, and the `upload` function for creating new archive items. Authentication for uploads requires setting `IA_ACCESS_KEY` and `IA_SECRET_KEY` environment variables."},"warnings":[{"fix":"Upgrade to internetarchive v5.7.0 or higher immediately. Always use `ia delete --dry-run` first to preview deletions.","message":"A critical bug in versions v5.4.1 to v5.6.x (fixed in v5.7.0) caused `ia delete --glob` and `ia delete --format` to delete *all* files in an item, regardless of the specified pattern, potentially leading to significant data loss.","severity":"breaking","affected_versions":">=5.4.1, <5.7.0"},{"fix":"Upgrade to internetarchive v5.5.1 or higher immediately. Implement robust input validation if handling untrusted filenames directly.","message":"Versions <=5.5.0 contain a critical directory traversal vulnerability in `File.download()` (fixed in v5.5.1). This allowed malicious filenames to write files outside the target directory, a severe risk, especially on Windows.","severity":"breaking","affected_versions":"<5.5.1"},{"fix":"Implement exponential backoff or ensure your upload script adheres to reasonable delays. Contact info@archive.org if your API upload privilege is locked.","message":"The Internet Archive API has rate limits, especially for uploads. Exceeding these limits can lead to temporary (or sometimes persistent, requiring manual intervention) account lockout for API uploading privileges.","severity":"gotcha","affected_versions":"All"},{"fix":"Carefully plan and confirm all required metadata, especially 'write-once' fields, before the first upload of an item.","message":"Certain metadata fields (e.g., `mediatype`, `collection`) are 'write-once' and can only be set during the *initial* upload of an item. Subsequent attempts to modify them will be ignored or cause errors.","severity":"gotcha","affected_versions":"All"},{"fix":"Always install `internetarchive` using `pip`, `pipx`, or from source. If previously installed via an unsupported method, uninstall it first (e.g., `brew uninstall internetarchive`) and then use a supported method.","message":"Installing `internetarchive` via unsupported third-party package managers (e.g., Homebrew, MacPorts, Linux system packages like `apt` or `yum`) often results in severely outdated, incompatible, or broken versions.","severity":"gotcha","affected_versions":"All"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Ensure `IA_ACCESS_KEY` and `IA_SECRET_KEY` environment variables are correctly set with valid keys. Alternatively, use `internetarchive.configure()` or the `ia configure` CLI command to set them in the configuration file (`~/.ia`).","cause":"Incorrect or missing Internet Archive S3 credentials (IA_ACCESS_KEY, IA_SECRET_KEY) for operations requiring authentication (upload, metadata modification).","error":"HTTPError: 403 Client Error: Forbidden for url: https://s3.us.archive.org/..."},{"fix":"Upgrade the `internetarchive` library to version 5.7.0 or newer. Always test delete commands with `ia delete --dry-run` first to verify the intended files are targeted.","cause":"This describes the behavior of a critical bug in `ia delete` command versions 5.4.1 through 5.6.x, where glob patterns were ignored, leading to unintended mass deletions.","error":"All files in item 'my_item_id' were deleted unexpectedly after running 'ia delete --glob=\"*.txt\"'"},{"fix":"Upgrade to `internetarchive` v5.5.1 or higher to patch the directory traversal vulnerability. Ensure target download directories have appropriate write permissions for the user running the script.","cause":"This indicates an attempt by a potentially malicious filename to perform directory traversal, which was a vulnerability in `File.download()` prior to v5.5.1, or a general permission issue.","error":"Error: Failed to download file 'malicious_path/../../etc/passwd'. Permission denied."}]}