Keywords: AWS temporary credentials | boto3 | ExpiredToken error | credential refresh | CloudWatch metrics
Abstract: This article provides an in-depth analysis of ExpiredToken errors caused by AWS temporary security credential expiration, exploring the working principles of the assume_role method in boto3, credential validity mechanisms, and complete solution implementations. Through code examples, it demonstrates how to properly handle temporary credential refresh and renewal to ensure stability in long-running scripts. Combining AWS official documentation and practical cases, the article offers developers practical technical guidance.
Problem Background and Error Analysis
When performing large-scale metric collection from AWS CloudWatch, developers often encounter security token expiration errors. When script execution time exceeds the temporary credential validity period, the system throws botocore.exceptions.ClientError: An error occurred (ExpiredToken) when calling the GetMetricStatistics operation: The security token included in the request is expired exception.
Temporary Security Credential Mechanism Analysis
The assume_role method in AWS STS (Security Token Service) returns temporary security credentials that have explicit validity period limitations. According to AWS official documentation, temporary credentials can be configured with validity periods ranging from 900 seconds (15 minutes) to 3600 seconds (1 hour), with a default value of 1 hour.
In boto3, the process of obtaining temporary credentials is as follows:
import boto3
from botocore.exceptions import ClientError
# Create base session
session = boto3.Session(
aws_access_key_id=AWS_ACCESS_KEY_ID,
aws_secret_access_key=AWS_SECRET_ACCESS_KEY,
region_name=region_name
)
# Get STS client
sts_client = session.client('sts')
# Assume role to obtain temporary credentials
response = sts_client.assume_role(
RoleArn=role_arn,
RoleSessionName='metric-collection-session',
ExternalId=external_id
)
# Extract temporary credentials
credentials = response['Credentials']
temp_access_key = credentials['AccessKeyId']
temp_secret_key = credentials['SecretAccessKey']
temp_session_token = credentials['SessionToken']
# Create session using temporary credentials
assumed_session = boto3.Session(
aws_access_key_id=temp_access_key,
aws_secret_access_key=temp_secret_key,
aws_session_token=temp_session_token,
region_name=region_name
)
Root Causes of Credential Expiration Issues
The design purpose of temporary security credentials is to enhance security by limiting credential validity periods to reduce potential security risks. However, in long-running batch processing tasks, this mechanism can cause the following issues:
- Default Validity Limitations: When the
DurationSecondsparameter is not explicitly specified, credentials expire after 1 hour by default - Long-running Task Interruption: Tasks collecting large amounts of CloudWatch metrics may run for several hours, exceeding temporary credential validity
- Complex Error Recovery: After credential expiration, all subsequent API calls fail, requiring reacquisition of credentials and task state recovery
Solutions and Best Practices
To address temporary credential expiration issues, the following strategies can be adopted:
1. Setting Appropriate Credential Validity Period
Explicitly specify the DurationSeconds parameter when calling assume_role, setting a reasonable validity period based on expected task runtime:
response = sts_client.assume_role(
RoleArn=role_arn,
RoleSessionName='long-running-session',
ExternalId=external_id,
DurationSeconds=3600 # Set to maximum allowed 1 hour
)
2. Implementing Automatic Credential Refresh Mechanism
For tasks running longer than 1 hour, implement credential refresh logic. Below is a complete credential management class:
class AWSCredentialManager:
def __init__(self, base_session, role_arn, external_id, region_name):
self.base_session = base_session
self.role_arn = role_arn
self.external_id = external_id
self.region_name = region_name
self.credentials = None
self.credentials_expiry = None
self.sts_client = base_session.client('sts')
def get_current_session(self):
"""Get currently valid session, refresh credentials if necessary"""
if self._need_refresh():
self._refresh_credentials()
return boto3.Session(
aws_access_key_id=self.credentials['AccessKeyId'],
aws_secret_access_key=self.credentials['SecretAccessKey'],
aws_session_token=self.credentials['SessionToken'],
region_name=self.region_name
)
def _need_refresh(self):
"""Check if credential refresh is needed"""
if not self.credentials:
return True
# Start refresh 5 minutes before credential expiration
refresh_time = self.credentials_expiry - timedelta(minutes=5)
return datetime.utcnow() >= refresh_time
def _refresh_credentials(self):
"""Refresh temporary security credentials"""
response = self.sts_client.assume_role(
RoleArn=self.role_arn,
RoleSessionName=f"refreshed-session-{int(time.time())}",
ExternalId=self.external_id,
DurationSeconds=3600
)
self.credentials = response['Credentials']
self.credentials_expiry = self.credentials['Expiration']
print(f"Credentials refreshed, valid until: {self.credentials_expiry}")
# Usage example
credential_manager = AWSCredentialManager(
base_session=session,
role_arn=role_arn,
external_id=external_id,
region_name=region_name
)
# Periodically get new session during long-running tasks
for instance in ec2_instances:
current_session = credential_manager.get_current_session()
cloudwatch_client = current_session.client('cloudwatch')
# Collect metric data
metrics = cloudwatch_client.get_metric_statistics(
Namespace='AWS/EC2',
MetricName='CPUUtilization',
Dimensions=[{'Name': 'InstanceId', 'Value': instance.id}],
StartTime=start_time,
EndTime=end_time,
Period=300,
Statistics=['Average', 'Maximum', 'Minimum']
)
3. Environment Variable Conflict Handling
Referring to supplementary information from the Q&A data, environment variable credentials may conflict with session configuration. Ensure using the correct credential source:
# Clean up potentially conflicting environment variables
import os
if 'AWS_SESSION_TOKEN' in os.environ:
del os.environ['AWS_SESSION_TOKEN']
if 'AWS_ACCESS_KEY_ID' in os.environ:
del os.environ['AWS_ACCESS_KEY_ID']
if 'AWS_SECRET_ACCESS_KEY' in os.environ:
del os.environ['AWS_SECRET_ACCESS_KEY']
Error Handling and Monitoring
Implement robust error handling mechanisms to catch and handle credential expiration exceptions:
def safe_cloudwatch_call(cloudwatch_client, call_args):
"""Safe CloudWatch API call with automatic credential expiration handling"""
max_retries = 3
for attempt in range(max_retries):
try:
return cloudwatch_client.get_metric_statistics(**call_args)
except ClientError as e:
if e.response['Error']['Code'] == 'ExpiredToken':
if attempt < max_retries - 1:
print(f"Token expired, refreshing credentials (attempt {attempt + 1})")
# Trigger credential refresh and retry
credential_manager._refresh_credentials()
cloudwatch_client = credential_manager.get_current_session().client('cloudwatch')
continue
else:
raise Exception("Max retries exceeded for token refresh")
else:
raise
# Use safe API calls
metrics_data = safe_cloudwatch_call(cloudwatch_client, {
'Namespace': 'AWS/EC2',
'MetricName': 'NetworkIn',
'Dimensions': [{'Name': 'InstanceId', 'Value': 'i-1234567890abcdef0'}],
'StartTime': start_time,
'EndTime': end_time,
'Period': 300,
'Statistics': ['Average', 'Sum']
})
Performance Optimization Considerations
When implementing credential refresh mechanisms, consider the following performance factors:
- Refresh Frequency: Avoid overly frequent credential refreshes, recommended to start refresh 5-10 minutes before expiration
- Session Reuse: Reuse session and client instances as much as possible during credential validity period
- Concurrency Safety: Ensure thread safety of credential refresh in multi-threaded environments
- Error Fallback: Implement appropriate error fallback mechanisms, such as using backup credentials or degraded services
Conclusion
The validity period mechanism of AWS temporary security credentials is an important security feature, but it can present challenges in long-running tasks. By understanding the working principles of assume_role, implementing automatic refresh mechanisms, handling environment variable conflicts, and establishing robust error handling, ExpiredToken issues can be effectively resolved. The solutions provided in this article are not only applicable to CloudWatch metric collection scenarios but can also be extended to other application scenarios requiring long-term AWS API access.
In practical applications, it is recommended to combine specific business requirements and security needs to choose appropriate credential management strategies. For critical business systems, consider using more advanced features of AWS SDK, such as custom credential providers or alternative solutions like attaching IAM roles directly to EC2 instances.