Apache Airflow Provider for Apache HDFS
raw JSON → 4.11.5 verified Sat May 09 auth: no python
A provider package for Apache Airflow that integrates with Apache HDFS, providing hooks and operators for HDFS file operations. Current version 4.11.5 requires Python >=3.10. Released on Airflow provider schedule.
pip install apache-airflow-providers-apache-hdfs Common errors
error ModuleNotFoundError: No module named 'airflow.hooks.hdfs' ↓
cause Using old import path before provider split.
fix
Change to
from airflow.providers.apache.hdfs.hooks.hdfs import HDFSHook. error ImportError: cannot import name 'HdfsPutFileOperator' from 'airflow.operators.hdfs' ↓
cause Operator moved to provider package.
fix
Use
from airflow.providers.apache.hdfs.operators.hdfs import HdfsPutFileOperator. error airflow.exceptions.AirflowException: Connection 'hdfs_default' not found ↓
cause No Airflow connection configured for HDFS.
fix
Create an Airflow connection with conn_id='hdfs_default', type='HDFS', and host/port details.
Warnings
breaking HDFSHook and operators were moved from `airflow.hooks.hdfs` and `airflow.operators.hdfs` to `airflow.providers.apache.hdfs` in version 2.0.0. Old imports will break. ↓
fix Use new import paths: `from airflow.providers.apache.hdfs.hooks.hdfs import HDFSHook`
gotcha The HDFSHook's get_conn() returns a HDFSClient that may require explicit authentication; default uses 'hdfs' connection string without kerberos if not configured. ↓
fix Ensure your Airflow connection has extra parameters for kerberos if needed.
deprecated HdfsMkdirFileOperator and HdfsPutFileOperator are deprecated as of provider version 4.0.0 in favor of generic FileTransferOperator. ↓
fix Use `FileTransferOperator` from `airflow.providers.apache.hdfs.operators.hdfs` or implement custom logic.
Imports
- HDFSHook wrong
from airflow.hooks.hdfs import HDFSHookcorrectfrom airflow.providers.apache.hdfs.hooks.hdfs import HDFSHook - HdfsOperators
from airflow.providers.apache.hdfs.operators.hdfs import HdfsMkdirFileOperator, HdfsPutFileOperator, HdfsGetFileOperator, HdfsDeleteFileOperator, HdfsListDirectoryOperator, HdfsConcatFileOperator, HdfsMoveFileOperator
Quickstart
from airflow.providers.apache.hdfs.hooks.hdfs import HDFSHook
hook = HDFSHook(conn_id='hdfs_default')
# List files in a directory
files = hook.list_directory('/tmp')
print(files)