{"id":9176,"library":"pgcopy","title":"pgcopy","description":"pgcopy is a Python library designed for fast data loading into PostgreSQL databases leveraging the PostgreSQL binary COPY protocol. It supports various database adaptors like psycopg2, psycopg, pg8000, and PyGreSQL, and handles a wide range of PostgreSQL data types, including arrays. The library is currently at version 1.6.2 and focuses on efficient bulk data insertion.","status":"active","version":"1.6.2","language":"en","source_language":"en","source_url":"https://github.com/altaurog/pgcopy","tags":["postgresql","copy","bulk insert","data loading","database","psycopg","psycopg2"],"install":[{"cmd":"pip install pgcopy","lang":"bash","label":"Latest stable version"}],"dependencies":[{"reason":"Primary PostgreSQL database adapter. While pgcopy supports multiple adapters, psycopg2 is commonly used.","package":"psycopg2","optional":false},{"reason":"Alternative PostgreSQL database adapter, especially Psycopg 3 which offers different installation and features.","package":"psycopg","optional":true},{"reason":"Required for timezone handling with timestamp data types.","package":"pytz","optional":false}],"imports":[{"symbol":"CopyManager","correct":"from pgcopy import CopyManager"},{"note":"The `Replace` context manager is directly available under `pgcopy` since newer versions; `pgcopy.util.Replace` was an older path or for a subclass `RenameReplace`.","wrong":"from pgcopy.util import Replace","symbol":"Replace","correct":"from pgcopy import Replace"}],"quickstart":{"code":"import os\nfrom datetime import datetime\nfrom pgcopy import CopyManager\nimport psycopg\n\n# Ensure you have a running PostgreSQL instance and a 'weather_db' database\n# You might need to create the database and a 'measurements_table'\n# For example, using `psql -c 'CREATE DATABASE weather_db;'`\n# And inside weather_db: `CREATE TABLE measurements_table (id INT, timestamp TIMESTAMP, location VARCHAR(255), temperature NUMERIC);`\n\ndb_name = os.environ.get('PG_DB_NAME', 'weather_db')\ndb_user = os.environ.get('PG_DB_USER', 'postgres')\ndb_password = os.environ.get('PG_DB_PASSWORD', '')\ndb_host = os.environ.get('PG_DB_HOST', 'localhost')\ndb_port = os.environ.get('PG_DB_PORT', '5432')\n\nconn_string = f\"dbname={db_name} user={db_user} password={db_password} host={db_host} port={db_port}\"\n\ntry:\n    conn = psycopg.connect(conn_string)\n    print(f\"Connected to database: {db_name}\")\n    cursor = conn.cursor()\n\n    # Define columns and records\n    cols = ('id', 'timestamp', 'location', 'temperature')\n    now = datetime.now()\n    records = [\n        (0, now, 'Jerusalem', 72.2),\n        (1, now, 'New York', 75.6),\n        (2, now, 'Moscow', 54.3),\n    ]\n\n    # Use CopyManager for fast insertion\n    mgr = CopyManager(conn, 'measurements_table', cols)\n    mgr.copy(records)\n    conn.commit()\n    print(\"Records copied successfully.\")\n\n    # Verify data (optional)\n    cursor.execute(\"SELECT * FROM measurements_table ORDER BY id;\")\n    print(\"\\nData in table after copy:\")\n    for row in cursor.fetchall():\n        print(row)\n\nfinally:\n    if conn:\n        cursor.close()\n        conn.close()\n        print(\"Database connection closed.\")\n","lang":"python","description":"This quickstart demonstrates how to connect to a PostgreSQL database using `psycopg` (or `psycopg2`), create a `CopyManager` instance, and efficiently insert multiple records using the `copy` method. It emphasizes the importance of committing the transaction after the copy operation. Environment variables are used for database credentials for security and flexibility."},"warnings":[{"fix":"For production, `pip install psycopg2` (ensuring build dependencies are met) or `pip install psycopg[binary]` (for Psycopg 3 binary) is recommended. Avoid `psycopg2-binary` in production `requirements.txt`.","message":"When using `psycopg2` (a common dependency for `pgcopy`), be aware of the `psycopg2-binary` package. `psycopg2-binary` is for quick starts and development but is not recommended for production environments due to potential build issues or lack of specific optimizations. For production, `psycopg2` (which requires local build tools) or `psycopg` (Psycopg 3 with a clearer binary/source distinction) is generally preferred.","severity":"gotcha","affected_versions":"All versions depending on `psycopg2` (>= 2.8 for binary distribution issues)."},{"fix":"For PostgreSQL 9.1 and earlier, avoid `pgcopy.Replace` if concurrent queries are expected. Consider `pgcopy.util.RenameReplace` (v0.6+) for scenarios where renaming is preferred over dropping, or manage the replacement manually. Ensure no external dependencies exist on the table if using `Replace`.","message":"The `pgcopy.Replace` context manager, which automates replacing an entire table for faster bulk updates, can lead to issues with concurrent queries on PostgreSQL versions 9.1 and earlier, as concurrent queries on the table will fail once the old table is dropped. Additionally, it's not suitable if other tables depend on the table being replaced.","severity":"gotcha","affected_versions":"<= 1.4.0 (and potentially newer versions if using older PostgreSQL)"},{"fix":"For large bulk imports, consider temporarily disabling triggers and foreign key constraints on the target table before the `copy` operation, and re-enabling them afterwards. This should be done within a transaction to maintain data integrity.","message":"PostgreSQL's `COPY` command (which `pgcopy` uses) performs best when tables have no triggers, foreign keys, or other constraints during the bulk import. These can significantly slow down the process as they are validated for each inserted row.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Batch data by size, not just time. Aim for batches that commit every 30-120 seconds under load. Monitor WAL and replica lag and adjust batch size accordingly.","message":"When dealing with large datasets, a single, very large `COPY` transaction can stress PostgreSQL by generating enormous Write-Ahead Log (WAL) segments, leading to rapid storage filling and replication falling behind. Conversely, very small batches can incur excessive transaction overhead.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Ensure the file is accessible and readable by the PostgreSQL server's user. Place the file in a universally accessible directory like `/tmp` (on Linux/macOS) or `C:\\Users\\Public` (on Windows), or adjust file/directory permissions for the PostgreSQL user. Alternatively, ensure the PostgreSQL user is a superuser (though less secure) or use client-side copy methods if available.","cause":"The PostgreSQL server process (not the client application running pgcopy) does not have the necessary file system permissions to read the specified file. This is a common issue because the COPY command is executed from the server's perspective.","error":"ERROR: could not open file \"/path/to/file.csv\" for reading: Permission denied"},{"fix":"Verify that the input data's structure (number and order of columns, data types) exactly matches the target table's schema. If empty strings should be treated as NULLs, use `NULL ''` in your `COPY` command options (if directly using SQL) or handle empty values in your data preparation for `pgcopy`.","cause":"The input data (e.g., CSV file) has a row with fewer columns than expected by the target table, or a non-numeric value is found in a numeric column, often an empty string. This can also happen if the column order in the input data does not match the table schema.","error":"ERROR: missing data for column \"column_name\""},{"fix":"Immediately free up disk space by removing old log files, clearing temporary files, or truncating unnecessary data. Investigate WAL archiving and replication slot issues if `pg_wal` is full. Implement proactive disk usage monitoring and regular `VACUUM` operations to prevent bloat.","cause":"The disk where PostgreSQL stores its data, WAL (Write-Ahead Log) files, or temporary files has run out of space. This can halt transactions, cause panics, and even lead to server shutdown.","error":"ERROR: could not extend file \"base/...\" or \"pg_wal/...\": No space left on device"},{"fix":"Verify that the PostgreSQL server is running and listening on the correct IP address and port (`listen_addresses` and `port` in `postgresql.conf`). Check firewall rules to ensure port 5432 (or your custom port) is open. Review PostgreSQL logs for server-side errors that might have caused the unexpected termination.","cause":"This error often indicates that the PostgreSQL server terminated abnormally before or during the processing of the request, or the connection was lost due to network issues, firewall blocking, or the server not running/listening on the expected address/port.","error":"server closed the connection unexpectedly"}]}