SAS Airflow Provider
The SAS Airflow Provider enables Apache Airflow users to create tasks for executing SAS Studio Flows and Jobs on a SAS Viya environment. It provides operators to interact with SAS assets, allowing for orchestration and monitoring of SAS processes within Airflow DAGs. Currently at version 0.0.23, the library is under active development with frequent updates addressing new features and improvements.
Common errors
-
requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))cause Network intermediaries (WAF, firewalls, load balancers) terminating seemingly idle TCP connections after a timeout, even if the SAS job is still running.fixReview and adjust network device idle timeout settings. For long-running SAS jobs, explore options to send periodic keep-alive signals or refactor the Airflow task to initiate the job and then poll for its completion status in separate, shorter-lived requests. Ensure your Airflow environment's IP is whitelisted. -
SAS connection failed during OAuth token acquisition: 'Failed to establish a new connection: [Errno 111] Connection refused' or network connection reset.
cause Firewall rules or network configurations preventing the Airflow worker from reaching the SASLogon service or Viya environment during OAuth token exchange.fixVerify network connectivity between your Airflow environment and the SAS Viya host, specifically to the SASLogon endpoint. Ensure that all necessary ports are open and that firewalls are not blocking the connection. Confirm correct host and certificate configurations. -
DAG not appearing in Airflow UI after deploying 'sas-airflow-provider' operators.
cause Common causes include syntax errors in the DAG file, the DAG file not being in the configured DAGs folder, or an unhandled exception during DAG parsing. Airflow providers must also be installed in the Airflow environment.fixCheck Airflow scheduler logs for parsing errors related to your DAG file. Ensure the `sas-airflow-provider` package is correctly installed in the Python environment where Airflow (scheduler and workers) is running. Verify the DAG file is in the `dags_folder` specified in `airflow.cfg`.
Warnings
- deprecated The `SASStudioFlowOperator` is deprecated. Users should migrate to `SASStudioOperator` for all SAS Studio flow and program executions. New features will only be added to `SASStudioOperator`.
- gotcha When running Airflow standalone on macOS, you might encounter issues with `urllib` and process forking. This can lead to connection problems with the SAS provider.
- breaking SAS connection issues (e.g., 'Connection reset by peer') for long-running jobs are often caused by WAF/firewall timeouts or network load balancers terminating idle TCP connections. TCP keep-alive settings may not be sufficient for all network configurations.
- gotcha Storing sensitive information (passwords, tokens) directly in Airflow connections carries security risks. Airflow connections are stored in the metadata database, which needs to be secured.
Install
-
pip install sas-airflow-provider
Imports
- SASStudioOperator
from sas_airflow_provider.operators.sas_studio_flow import SASStudioFlowOperator
from sas_airflow_provider.operators.sas_studio import SASStudioOperator
- SASJobExecutionOperator
from sas_airflow_provider.operators.sas_job_execution import SASJobExecutionOperator
- SASComputeCreateSessionOperator
from sas_airflow_provider.operators.sas_compute_session import SASComputeCreateSessionOperator
Quickstart
import pendulum
from airflow.models.dag import DAG
from sas_airflow_provider.operators.sas_studio import SASStudioOperator
import os
# NOTE: For a real deployment, configure your SAS connection in the Airflow UI.
# (Admin -> Connections, Connection Id: 'sas_default', Connection Type: 'SAS')
# Fill in Host, Login, Password or use 'Extra' JSON for OAuth token.
# Example 'Extra' for OAuth: {"token": "your_oauth_token_here"}
# Or for global variable: {"token_variable": "airflow_variable_name"}
# The connection below is for demonstration if env vars are used for local testing.
# Mock environment variables for connection for local testing (NOT PRODUCTION BEST PRACTICE)
os.environ['AIRFLOW_CONN_SAS_DEFAULT'] = (
'sas://' +
os.environ.get('SAS_DEFAULT_LOGIN', 'user') +
':' +
os.environ.get('SAS_DEFAULT_PASSWORD', 'password') +
'@' +
os.environ.get('SAS_DEFAULT_HOST', 'https://your-sas-viya-host.com')
)
with DAG(
dag_id="sas_studio_flow_example",
start_date=pendulum.datetime(2023, 1, 1, tz="UTC"),
catchup=False,
schedule=None,
tags=["sas", "studio", "example"],
) as dag:
run_my_sas_flow = SASStudioOperator(
task_id="run_sas_studio_flow_task",
path="/Public/my_airflow_test_flow", # Replace with your actual SAS Studio Flow path
connection_id="sas_default",
exec_type="flow", # Can also be 'program' for SAS programs
# Optional: pass macro variables to the flow
# macro_variables={"input_param": "airflow_value"},
# Optional: retrieve SAS logs to Airflow
# job_exec_log=True,
)