Apache Airflow Provider for Apache Pig

raw JSON →
4.8.4 verified Fri May 01 auth: no python

Apache Airflow provider for Apache Pig. Version 4.8.4 requires Airflow >=2.9.0 and Python >=3.10. This provider allows Airflow to run Pig jobs via the PigOperator. Release cadence follows Airflow's provider release cycle (monthly).

pip install apache-airflow-providers-apache-pig
error airflow.exceptions.AirflowException: The conn_id `pig_default` isn't defined
cause Connection 'pig_default' is not set up in Airflow.
fix
Create a connection in Airflow UI (Admin -> Connections) with Conn Id: pig_default, Conn Type: Pig, Host: localhost (or appropriate).
error TypeError: __init__() got an unexpected keyword argument 'pig_conn_id'
cause Using deprecated parameter 'pig_conn_id' in provider 4.0.0+.
fix
Replace 'pig_conn_id' with 'pig_cli_conn_id'.
gotcha The PigOperator expects the 'pig' argument to be a script string or file path. If using a file, set 'pig' to the file path and ensure the file is accessible on all worker nodes.
fix For inline Pig Latin, use a multiline string. For a file, set pig='/path/to/script.pig'.
deprecated The 'pig_conn_id' parameter was renamed to 'pig_cli_conn_id' in provider version 4.0.0 (Airflow 2.3+). Using the old name will fail.
fix Use 'pig_cli_conn_id' instead of 'pig_conn_id'. Also ensure the connection type is 'pig'.
gotcha The PigOperator runs the pig command via subprocess. Ensure the 'pig' CLI is installed and available in PATH on all worker nodes where tasks are executed.
fix Install Apache Pig and verify with 'which pig'. Set the executable path in the connection or via 'pig_cli_conn_id'.

Basic DAG using PigOperator. Note: 'pig_cli_conn_id' must be set to a connection with your Pig CLI environment.

from datetime import datetime
from airflow import DAG
from airflow.providers.apache.pig.operators.pig import PigOperator

with DAG(dag_id='example_pig', start_date=datetime(2025,1,1), schedule='@once', catchup=False) as dag:
    run_pig = PigOperator(
        task_id='run_pig',
        pig='ls /user/hadoop;',
        pig_opts='-x local',
        pig_cli_conn_id='pig_default'
    )
    run_pig