Filename Sanitizer
sanitize-filename is a simple, dependency-free, blacklist-based filename sanitizer for Python. It focuses on preserving the original filename as much as possible, including non-ASCII characters, while removing characters unsafe for common file systems. The current version is 1.2.0, released in April 2020. It's a stable library with infrequent updates, primarily for minor fixes and behavior uniformity.
Common errors
-
File operations fail due to 'Is a directory' or 'No such file or directory' errors even after sanitizing.
cause The `sanitize` function only cleans a *filename*, not a *file path*. It removes characters illegal in filenames but does not prevent path traversal sequences (e.g., `../`). Passing a full path with malicious elements to `sanitize` might not fully mitigate path traversal risks.fixAlways separate the path from the filename. Sanitize only the filename component, and validate/construct the directory path independently to prevent directory traversal attacks (e.g., using `os.path.basename` and `os.path.join`). -
Sanitized filename results in an empty string.
cause For certain inputs, such as `..`, ``, or `/.`, the `sanitize` function removes all invalid characters, which can result in an empty string. Attempting to create a file with an empty name will typically fail.fixAdd a check after sanitization: `sanitized_name = sanitize(input_name); if not sanitized_name: sanitized_name = 'untitled'`. Provide a sensible default filename if the sanitized output is empty.
Warnings
- gotcha This library uses a blacklist-based approach. While effective for common cases, a whitelist approach (allowing only known safe characters) is generally safer for highly sensitive applications or when dealing with untrusted user input, as blacklists can be incomplete.
- gotcha Sanitizing filenames can result in non-unique names if different unsafe inputs resolve to the same safe filename (e.g., 'file?.txt' and '*file*.txt' both become 'file.txt'). This can lead to overwriting files if not handled.
- breaking Prior to version 1.2.0, filename sanitization behavior might have been OS-dependent, and issues could occur with long filenames where the non-extension part consisted solely of dots. Version 1.2.0 introduced uniform behavior across operating systems and fixed this specific long filename issue.
Install
-
pip install sanitize-filename
Imports
- sanitize
from sanitize_filename import sanitize
Quickstart
from sanitize_filename import sanitize
# Example usage
unsafe_filename = 'My/Document:with"illegal*chars?.txt'
safe_filename = sanitize(unsafe_filename)
print(f"Original: {unsafe_filename}")
print(f"Sanitized: {safe_filename}")
# Another example with reserved names or paths
unsafe_path = '../etc/passwd'
safe_path = sanitize(unsafe_path)
print(f"Original: {unsafe_path}")
print(f"Sanitized: {safe_path}")