DataRecorder
DataRecorder is a Python toolkit designed for efficient and reliable data recording to various file formats. It tackles common issues in data collection like frequent file I/O by caching data and writing in batches, reducing overhead and preventing data loss from unexpected program termination. It supports multithreaded writes and automatically handles file locking. The library provides specialized tools like `Recorder` for sequential data, `Filler` for filling tabular data at specific coordinates, and `ByteRecorder` for binary data. It supports `csv`, `xlsx`, `json`, `txt`, and arbitrary binary file formats. [2]
Common errors
-
ModuleNotFoundError: No module named 'DataRecorder'
cause The DataRecorder library was not installed, or there's a typo in the import statement.fixInstall the library using `pip install DataRecorder`. Ensure the import statement is `from DataRecorder import ...` with correct capitalization. -
FileNotFoundError: [Errno 2] No such file or directory: 'non_existent_path/my_data.csv'
cause The specified directory for the output file does not exist, and DataRecorder by default might not create intermediate directories (though it aims to create the file itself).fixEnsure the directory path exists before initializing the Recorder: `import os; os.makedirs(os.path.dirname(file_path), exist_ok=True)`. -
ValueError: Invalid data format for 'xlsx' or 'db'
cause After version 3.x, if you're working with 'db' or 'xlsx' formats, the `add_data` method might expect a different data structure (e.g., dictionary) than what you are providing, or you are trying to write incompatible data to a structured format.fixReview the documentation or examples for the specific Recorder type and file format. For 3.x and above, ensure that data passed for 'db' and 'xlsx' corresponds to the expected dictionary format. Ensure data types match the target file structure (e.g., don't pass arbitrary strings to an Excel column expecting numbers).
Warnings
- breaking When processing 'db' (database) and 'xlsx' (Excel) formats, the `data` parameter's return value changed to a dictionary format in DataRecorder 3.x. [1]
- breaking The `record()` method no longer automatically prints data or returns unsaved data upon encountering an exception. This changes the error reporting and recovery mechanism. [1]
- gotcha For `.xlsx` file handling, the `openpyxl` library is implicitly required, but not always listed as a hard dependency. If not installed, operations on `.xlsx` files will fail.
- gotcha While DataRecorder handles caching, forgetting to call `record()` or `close()` (or using it in a `with` statement) may result in data not being flushed to the file, especially in short-lived scripts or on abnormal termination.
Install
-
pip install DataRecorder
Imports
- Recorder
from DataRecorder import Recorder
- Filler
from DataRecorder import Filler
- ByteRecorder
from DataRecorder import ByteRecorder
Quickstart
from DataRecorder import Recorder
import os
# Example for Recorder
file_path = 'my_data.csv'
r = Recorder(file_path)
data_row_1 = (1, 2, 3, 4)
data_row_2 = (5, 6, 7, 8)
r.add_data(data_row_1) # Record a single row of data
r.add_data(data_row_2) # Record another single row
r.add_data('just a string') # Can also record single values
# Simulate collecting more data
for i in range(10):
r.add_data([f'item_{i}', i * 10, True])
r.record() # Force flush any cached data to file
r.close() # Close the recorder, ensuring all data is written
print(f"Data written to {file_path}")
# Clean up the created file for re-runnability
if os.path.exists(file_path):
os.remove(file_path)