filesplit Python Library
filesplit is a Python module designed for splitting large files into smaller, manageable chunks and subsequently merging them back together. It supports splitting by size, number of chunks, or number of lines, and works with both text and binary files. The current version is 4.1.0, and it maintains an active, somewhat regular release cadence for minor improvements and bug fixes, with major versions introducing breaking changes.
Warnings
- breaking Version 4.0.0 introduced significant breaking changes by renaming primary splitting methods. For instance, the generic `split()` method and specific `split_by_encoding()` were replaced with more explicit names like `split_by_size()`, `split_by_chunks()`, and `split_by_lines()`.
- breaking Version 3.0.0 changed the primary class name from `FileSplit` to `Filesplit` (case change) and removed the `splitbyencoding()` method, integrating its functionality into the `split()` method (which itself was later refactored in v4.0.0).
- gotcha The library explicitly requires Python 3. Versions prior to 2.0.0 supported Python 2, but all subsequent versions dropped Python 2 compatibility.
- gotcha When splitting files, ensure the `output_dir` provided to `split_by_size`, `split_by_chunks`, or `split_by_lines` either exists or is created by your script. The library does not automatically create the output directory.
Install
-
pip install filesplit
Imports
- Filesplit
from filesplit.split import Filesplit
- merge_files
from filesplit.merge import merge_files
Quickstart
import os
from filesplit.split import Filesplit
from filesplit.merge import merge_files
# Create a dummy file for splitting
dummy_content = "This is a test file for filesplit.\n" * 50
input_file = "my_large_file.txt"
output_dir = "output_chunks"
merged_file = "my_merged_file.txt"
with open(input_file, "w") as f:
f.write(dummy_content)
print(f"Created dummy file: {input_file}")
# Initialize Filesplit
fs = Filesplit()
# --- Split the file by size (e.g., 100 bytes per chunk) ---
os.makedirs(output_dir, exist_ok=True)
print(f"Splitting '{input_file}' into '{output_dir}' by size...")
fs.split_by_size(
file=input_file,
size=100,
output_dir=output_dir
)
print("Splitting complete.")
# --- Merge the files back ---
print(f"Merging files from '{output_dir}' back into '{merged_file}'...")
merge_files(
input_dir=output_dir,
output_file=merged_file
)
print("Merging complete.")
# --- Clean up ---
import shutil
if os.path.exists(input_file): os.remove(input_file)
if os.path.exists(merged_file): os.remove(merged_file)
if os.path.exists(output_dir): shutil.rmtree(output_dir)
print("Cleanup complete.")