Django PostgreSQL Copy
django-postgres-copy is a Django package that facilitates fast and efficient bulk import and export of delimited data (like CSV files) to and from PostgreSQL databases, leveraging PostgreSQL's native `COPY` command. It is significantly faster than using Django's ORM for large datasets. The library is actively maintained, currently at version 2.8.0, and regularly updates to support recent Django and Python versions.
Warnings
- gotcha The library added support for `psycopg` (v3) in version 2.8.0, while maintaining `psycopg2` compatibility. However, `psycopg3` introduces several breaking changes compared to `psycopg2` in its underlying API. Although `django-postgres-copy` has a compatibility layer, users with existing direct `psycopg2` database interactions or complex connection pooling setups may need to adjust their project configuration or code during migration to `psycopg3`.
- breaking `django-postgres-copy` is regularly updated for Django version compatibility. Older versions of `django-postgres-copy` might not be fully compatible with newer Django versions, or vice-versa. For example, version 2.7.3 included a fix for 'Django 4.2+ compatible `setup_query`'. Always check the PyPI classifiers or GitHub releases for specific Django version support when upgrading either library.
- gotcha The library allows for powerful SQL-based data transformations during import using methods like `copy_field_template` or `setup_query`. While flexible, if these SQL statements are constructed using unsanitized user-provided input, they can introduce SQL injection vulnerabilities.
- gotcha By default, for performance optimization during large imports, the `from_csv` method (via `CopyMapping`) temporarily drops database constraints and indexes, then recreates them. While this speeds up the process, it means data integrity checks are momentarily suspended, which could potentially expose data to inconsistencies if an unexpected error occurs during the import process.
- gotcha When importing data that includes auto-incrementing primary keys (like Django's default `id` field) using `COPY FROM`, PostgreSQL's sequence generator for that table is not automatically updated. If you subsequently insert new records via Django's ORM, `IntegrityError` or `UniqueViolation` exceptions may occur if the ORM tries to insert an ID that conflicts with a manually `COPY`-inserted ID.
Install
-
pip install django-postgres-copy
Imports
- CopyManager
from postgres_copy import CopyManager
Quickstart
import os
from django.db import models
from postgres_copy import CopyManager
# Define a simple Django model with CopyManager
class Person(models.Model):
name = models.CharField(max_length=500)
number = models.IntegerField(null=True)
date = models.DateField(null=True)
objects = CopyManager()
class Meta:
app_label = 'myapp' # Replace with your app's name
# Example CSV content for 'data.csv'
# name,number,date
# ben,1,2024-01-01
# joe,2,2024-01-02
# jane,3,2024-01-03
# Create a dummy CSV file for the example (in a real scenario, this would be an existing file)
csv_content = "name,number,date\nben,1,2024-01-01\njoe,2,2024-01-02\njane,3,2024-01-03"
with open("data.csv", "w") as f:
f.write(csv_content)
# --- Import data from CSV ---
# In a real Django setup, ensure your database connection is configured and migrations are run.
# Example: Person.objects.from_csv(
# "./data.csv",
# dict(name="name", number="number", date="date") # Mapping model fields to CSV headers
# )
print("To import from CSV:")
print("Person.objects.from_csv('./data.csv', dict(name='name', number='number', date='date'))")
# --- Export data to CSV (assuming some data exists in the model) ---
# Example: Person.objects.to_csv("./export.csv")
print("\nTo export to CSV:")
print("Person.objects.to_csv('./export.csv')")
# Clean up dummy file
os.remove("data.csv")
# if os.path.exists("export.csv"): os.remove("export.csv") # Uncomment if testing export