Keywords: Boto3 | Amazon S3 | File Upload | Python SDK | AWS
Abstract: This article provides a comprehensive guide on migrating from Boto2 to Boto3 for writing files and data to Amazon S3 objects. It compares Boto2's set_contents_from methods with Boto3's put(), put_object(), upload_file(), and upload_fileobj() methods, offering complete code examples and best practices including error handling, metadata configuration, and progress monitoring capabilities.
Migration Overview from Boto2 to Boto3
With the evolution of AWS Python SDK from Boto2 to Boto3, significant changes have occurred in how S3 object storage operations are performed. In Boto2, developers used methods like Key.set_contents_from_string(), Key.set_contents_from_file(), Key.set_contents_from_filename(), and Key.set_contents_from_stream() to write to S3 objects. These methods have been replaced by more modern and flexible APIs in Boto3.
Core Writing Methods in Boto3
Object.put() Method
The Object.put() method is one of the primary ways to write to S3 objects in Boto3. It operates through the S3 resource interface and offers intuitive usage:
import boto3
# Prepare binary data
some_binary_data = b'Here we have some data'
# Using Object.put() method
s3 = boto3.resource('s3')
object = s3.Object('my_bucket_name', 'my/key/including/filename.txt')
object.put(Body=some_binary_data)This approach is particularly suitable for handling binary data already in memory or data streams obtained directly from other sources.
Client.put_object() Method
The Client.put_object() provides similar functionality but is accessed through the client interface:
import boto3
more_binary_data = b'Here we have some more data'
# Using Client.put_object() method
client = boto3.client('s3')
client.put_object(Body=more_binary_data, Bucket='my_bucket_name', Key='my/key/including/anotherfilename.txt')Both methods are functionally equivalent, with the choice depending primarily on the developer's programming style and application architecture requirements.
Advanced File Upload Methods
upload_file Method
For scenarios involving direct upload from local file systems, Boto3 provides the specialized upload_file method:
import boto3
import logging
from botocore.exceptions import ClientError
import os
def upload_file(file_name, bucket, object_name=None):
"""Upload a file to an S3 bucket
:param file_name: File to upload
:param bucket: Bucket to upload to
:param object_name: S3 object name. If not specified then file_name is used
:return: True if file was uploaded, else False
"""
# If S3 object_name was not specified, use file_name
if object_name is None:
object_name = os.path.basename(file_name)
# Upload the file
s3_client = boto3.client('s3')
try:
response = s3_client.upload_file(file_name, bucket, object_name)
except ClientError as e:
logging.error(e)
return False
return TrueThis method automatically handles multipart uploads for large files, improving upload efficiency and reliability.
upload_fileobj Method
For situations requiring data upload from file objects, the upload_fileobj method is available:
import boto3
s3 = boto3.client('s3')
with open("FILE_NAME", "rb") as f:
s3.upload_fileobj(f, "amzn-s3-demo-bucket", "OBJECT_NAME")It's important to note that file objects must be opened in binary mode, not text mode.
Advanced Configuration Options
Using the ExtraArgs Parameter
Boto3 upload methods support the ExtraArgs parameter for configuring various advanced options:
# Setting metadata
s3.upload_file(
'FILE_NAME', 'amzn-s3-demo-bucket', 'OBJECT_NAME',
ExtraArgs={'Metadata': {'mykey': 'myvalue'}}
)
# Setting access control list
s3.upload_file(
'FILE_NAME', 'amzn-s3-demo-bucket', 'OBJECT_NAME',
ExtraArgs={'ACL': 'public-read'}
)
# Setting custom permissions
s3.upload_file(
'FILE_NAME', 'amzn-s3-demo-bucket', 'OBJECT_NAME',
ExtraArgs={
'GrantRead': 'uri="http://acs.amazonaws.com/groups/global/AllUsers"',
'GrantFullControl': 'id="01234567890abcdefg"',
}
)Progress Monitoring Capabilities
Upload progress monitoring can be implemented using the Callback parameter:
import os
import sys
import threading
class ProgressPercentage(object):
def __init__(self, filename):
self._filename = filename
self._size = float(os.path.getsize(filename))
self._seen_so_far = 0
self._lock = threading.Lock()
def __call__(self, bytes_amount):
# To simplify, assume this is hooked up to a single filename
with self._lock:
self._seen_so_far += bytes_amount
percentage = (self._seen_so_far / self._size) * 100
sys.stdout.write(
"\r%s %s / %s (%.2f%%)" % (
self._filename, self._seen_so_far, self._size,
percentage))
sys.stdout.flush()
# Using progress monitoring
s3.upload_file(
'FILE_NAME', 'amzn-s3-demo-bucket', 'OBJECT_NAME',
Callback=ProgressPercentage('FILE_NAME')
)Method Selection Guidelines
When choosing which method to use, consider the following factors:
- For data already in memory, use
Object.put()orClient.put_object() - For local file uploads, use
upload_fileorupload_fileobj - When progress monitoring is needed, use methods supporting the
Callbackparameter - For complex metadata or permission settings, use methods supporting
ExtraArgs
All methods are functionally equivalent, with selection primarily based on developer preference and specific application scenario requirements.