Complete Guide to Parameter Passing When Manually Triggering DAGs via CLI in Apache Airflow

Dec 08, 2025 · Programming · 8 views · 7.8

Keywords: Apache Airflow | Parameter Passing | CLI Trigger | DAG Configuration | Workflow Orchestration

Abstract: This article provides a comprehensive exploration of various methods for passing parameters when manually triggering DAGs via CLI in Apache Airflow. It begins by introducing the core mechanism of using the --conf option to pass JSON configuration parameters, including how to access these parameters in DAG files through dag_run.conf. Through complete code examples, it demonstrates practical applications of parameters in PythonOperator and BashOperator. The article also compares the differences between --conf and --tp parameters, explaining why --conf is the recommended solution for production environments. Finally, it offers best practice recommendations and frequently asked questions to help users efficiently manage parameterized DAG execution in real-world scenarios.

Overview of Parameter Passing Mechanisms in Apache Airflow

Apache Airflow, as a powerful workflow orchestration platform, provides flexible DAG (Directed Acyclic Graph) definition and execution capabilities. In real production environments, there is often a need to pass dynamic parameters when manually triggering DAGs to meet requirements such as data reprocessing and test validation. This article delves deeply into the parameter passing mechanisms when manually triggering DAGs via CLI.

Core Parameter Passing Method: The --conf Option

Airflow's trigger_dag command supports the --conf option, allowing users to pass configuration parameters in JSON format. This is currently the most recommended approach for production environments for the following reasons:

CLI Command Example

The syntax for passing parameters via command line is as follows:

airflow trigger_dag 'example_dag' -r 'manual_run_001' --conf '{"start_time": "2024-01-01T01:30:00", "end_time": "2024-01-02T01:30:00", "data_source": "backup"}'

In this example, we pass three parameters: start_time, end_time, and data_source. These parameters will be encapsulated in the conf attribute of the DagRun object.

Parameter Access in DAG Files

In DAG definition files, parameters can be accessed in multiple ways:

Accessing Parameters in PythonOperator

from airflow import DAG
from airflow.operators.python_operator import PythonOperator
from datetime import datetime

def process_data(**kwargs):
    """Python function for processing data"""
    dag_run = kwargs['dag_run']
    
    # Access passed parameters
    start_time = dag_run.conf.get('start_time')
    end_time = dag_run.conf.get('end_time')
    data_source = dag_run.conf.get('data_source', 'default')
    
    print(f"Processing time range: {start_time} to {end_time}")
    print(f"Data source: {data_source}")
    
    # Actual data processing logic
    # ...

# DAG definition
default_args = {
    'owner': 'data_team',
    'start_date': datetime(2024, 1, 1),
}

dag = DAG(
    'data_processing_dag',
    default_args=default_args,
    schedule_interval='30 1 * * *',  # Execute daily at 01:30
    catchup=False
)

process_task = PythonOperator(
    task_id='process_data_task',
    python_callable=process_data,
    provide_context=True,  # Must be set to True to access context
    dag=dag
)

Using Parameters in Template Fields

Airflow supports direct parameter references in template fields, which is particularly useful for scenarios requiring command-line parameters such as BashOperator and SSHOperator:

from airflow.operators.bash_operator import BashOperator

bash_task = BashOperator(
    task_id='data_extraction',
    bash_command='''
        echo "Starting data extraction"
        python extract_data.py \
            --start "{{ dag_run.conf["start_time"] }}" \
            --end "{{ dag_run.conf["end_time"] }}" \
            --source "{{ dag_run.conf.get("data_source", "primary") }}"
    ''',
    dag=dag
)

Handling Parameter Default Values

In practical applications, it's important to gracefully handle missing parameters:

def process_with_defaults(**kwargs):
    dag_run = kwargs['dag_run']
    
    # Using get method to provide default values
    start_time = dag_run.conf.get('start_time', 
        (datetime.utcnow() - timedelta(days=1)).isoformat())
    end_time = dag_run.conf.get('end_time', 
        datetime.utcnow().isoformat())
    
    # Or using conditional checks
    if 'data_source' not in dag_run.conf:
        dag_run.conf['data_source'] = 'default_source'
    
    # Processing logic...

Comparison with --tp Parameter

Although the airflow test command supports the -tp parameter, this approach has limitations:

Therefore, for manual triggering scenarios in production environments, using the --conf option is strongly recommended.

Best Practice Recommendations

  1. Parameter Validation: Add parameter validation logic in DAGs to ensure the legality of passed parameters
  2. Error Handling: Add appropriate exception handling for parameter access
  3. Documentation: Clearly document supported parameters and their formats in DAG definitions
  4. Security: Avoid passing sensitive information in parameters; use Airflow's Variables or Connections to store confidential data
  5. Version Compatibility: Be aware of differences in parameter passing support across different Airflow versions

Frequently Asked Questions

Q: What data types can be used in parameters?
A: Parameters passed via --conf support all data types supported by JSON, including strings, numbers, booleans, arrays, and objects.

Q: How to pass parameters containing special characters?
A: Properly escape special characters in JSON strings, for example, double quotes should be escaped as \".

Q: Is there a size limit for parameters?
A: There is theoretically a limit to JSON string size, but it is rarely reached in practice. It is recommended to store large data in external storage and pass references through parameters.

Conclusion

Passing parameters via the --conf option when manually triggering DAGs is a powerful and flexible feature in Apache Airflow. Proper use of this mechanism can significantly enhance the flexibility and maintainability of workflows. The examples and best practices provided in this article can help developers efficiently implement parameterized DAG execution in real-world projects.

Copyright Notice: All rights in this article are reserved by the operators of DevGex. Reasonable sharing and citation are welcome; any reproduction, excerpting, or re-publication without prior permission is prohibited.