dvc-objects library
dvc-objects provides filesystem and object-database level abstractions, serving as a core component for DVC (Data Version Control) and related data management tools. It handles operations like hashing files, managing object storage, and providing an fsspec-compatible interface for various backends. The library is actively maintained with frequent minor releases.
Warnings
- breaking Python 3.8 support was dropped in version 5.1.0. Users on Python 3.8 will need to upgrade their Python interpreter to 3.9 or newer.
- breaking The `tqdm` implementation and `TqdmCallback` were removed in version 5.0.0. This significantly changes how progress reporting is handled for file operations, requiring users to implement custom callbacks or rely on upstream DVC progress mechanisms.
- gotcha The `fs.info` method gained a `return_exceptions` parameter in version 5.2.0. Its default behavior or implications when `True` might alter how errors are handled when querying file system information.
- gotcha Starting from versions 5.1.1 and 5.1.2, `dvc-objects` includes enhanced fallback mechanisms for file linking (e.g., copying if `os.link` is unavailable or if `EAGAIN` errors occur during linking). This can affect performance and disk space usage on certain platforms or under specific conditions.
Install
-
pip install dvc-objects
Imports
- hash_file
from dvc_objects.file import hash_file
- ObjectDBManager
from dvc_objects.db.manager import ObjectDBManager
- get_fs
from dvc_objects.fs import get_fs
Quickstart
import os
import tempfile
from pathlib import Path
from dvc_objects.file import hash_file
from dvc_objects.hash_info import HashInfo
# Create a temporary file for demonstration
with tempfile.TemporaryDirectory() as tmpdir:
test_file_path = Path(tmpdir) / "my_data.txt"
test_file_path.write_text("This is some sample data for hashing.")
print(f"Hashing file: {test_file_path}")
# Hash the file using the default algorithm (e.g., MD5)
hash_info: HashInfo = hash_file(test_file_path)
print(f"\nFile path: {test_file_path}")
print(f"Hash Type: {hash_info.name}")
print(f"Hash Value: {hash_info.value}")
# Example: Verify content based on hash
if hash_info.name == "md5" and hash_info.value == "2efb72b834458f4a7c1b52b36e355745":
print("Hash matches expected value!")
else:
print("Hash does not match expected value or is not MD5.")