Keywords: Apache Airflow | execution_date | BashOperator | Jinja2 templates | context variables
Abstract: This article provides a comprehensive exploration of the core concepts and access mechanisms for the execution_date variable in Apache Airflow. Through analysis of a typical use case involving BashOperator calls to REST APIs, the article explains why execution_date cannot be used directly during DAG file parsing and how to correctly access this variable at task execution time using Jinja2 templates. The article systematically introduces Airflow's template system, available default variables (such as ds, ds_nodash), and macro functions, with practical code examples for various scenarios. Additionally, it compares methods for accessing context variables across different operators (BashOperator, PythonOperator), helping readers fully understand Airflow's execution model and variable passing mechanisms.
Airflow Execution Model and Variable Access Timing
In Apache Airflow, understanding the access timing of context variables like execution_date is crucial. Airflow's DAG execution consists of two main phases: parsing phase and execution phase. During the parsing phase, the DAG file is loaded and validated, where all task definitions must be syntactically correct, but context variables such as execution_date are not yet available because they are tied to specific DAG run instances. Only during the execution phase, when task instances are scheduled to run, are these variables injected into the task context.
Template System in BashOperator
The bash_command parameter of BashOperator is actually a Jinja2 template string, not a regular static string. This means you can embed template variables and expressions in the command string using double curly braces {{ }}. For example, to access the execution date, you can directly use {{ execution_date }}, which will automatically be replaced with the current task instance's datetime object.
Formatting and Transforming execution_date
While execution_date itself is a datetime object, in practical applications we often need to convert it to specific string formats. Airflow provides a series of predefined template variables to simplify this process:
{{ ds }}: Returns date string in YYYY-MM-DD format{{ ds_nodash }}: Returns date string in YYYYMMDD format{{ ts }}: Returns full ISO format timestamp{{ ts_nodash }}: Returns timestamp without punctuation
For the REST API call in the original question, the correct implementation should be:
curl_cmd = "curl -XPOST '{{ hostname }}:8000/run?st={{ ds }}'"
t1 = BashOperator(
task_id='rest-api-1',
bash_command=curl_cmd,
dag=dag
)
Using Macros for Date Calculations
Airflow's template system also provides the macros module, which contains utility functions for date calculations. For example, to get the first day of the current month:
command = "some_script.sh {{ execution_date.replace(day=1) }}"
To get the last day of the previous month:
command = "some_script.sh {{ execution_date.replace(day=1) - macros.timedelta(days=1) }}"
Context Access in PythonOperator
While the original question primarily focuses on BashOperator, it's important to understand context access methods in other operators. For PythonOperator, you can obtain the complete task context by setting provide_context=True:
def process_data(**kwargs):
execution_date = kwargs['execution_date']
ds = kwargs['ds']
# Processing logic
return result
python_task = PythonOperator(
task_id='process_data',
python_callable=process_data,
provide_context=True,
dag=dag
)
Context Access in Custom Operators
When creating custom operators, you can access all context variables through the context parameter in the execute method:
class CustomOperator(BaseOperator):
def execute(self, context):
execution_date = context.get('execution_date')
# Custom logic
Common Errors and Best Practices
1. Avoid using context variables during DAG parsing: Do not attempt to access execution_date in the global scope of the DAG file, as this will cause errors.
2. Use template syntax correctly: Ensure proper Jinja2 template syntax is used in bash_command, including correct variable references and expression formatting.
3. Consider timezone handling: execution_date defaults to UTC timezone. If local timezone is needed, use macros.datetime related functions for conversion.
4. Test template rendering: During development, use Airflow's airflow tasks test command to test template rendering results, ensuring correct variable substitution.
Practical Application Examples
The following is a complete example demonstrating flexible use of execution_date and related template variables in BashOperator:
from airflow import DAG
from airflow.operators.bash import BashOperator
from datetime import datetime, timedelta
default_args = {
'owner': 'airflow',
'start_date': datetime(2024, 1, 1),
'retries': 1,
'retry_delay': timedelta(minutes=5)
}
with DAG('api_pipeline',
default_args=default_args,
schedule_interval=timedelta(hours=1),
catchup=False) as dag:
# Using ds_nodash as part of filename
api_call = BashOperator(
task_id='call_api',
bash_command="""
curl -X POST \
-H "Content-Type: application/json" \
-d '{\"date\": \"{{ ds }}\", \"timestamp\": \"{{ ts }}\"}' \
"http://api.example.com/endpoint"
"""
)
# Process previous day's logs
process_logs = BashOperator(
task_id='process_logs',
bash_command="""
# Get previous day's date
prev_day="{{ (execution_date - macros.timedelta(days=1)).strftime('%Y-%m-%d') }}"
process_logs.sh $prev_day
"""
)
api_call >> process_logs
By deeply understanding Airflow's template system and execution model, developers can more effectively utilize context variables like execution_date to build more flexible and powerful data pipelines. Remember that template variables are only available during task execution, which is an important design feature of Airflow that ensures task repeatability and scheduling consistency.