Keywords: pandas | JSON processing | data conversion | Google Maps API | elevation data
Abstract: This article provides a comprehensive guide on using pandas.json_normalize function to convert nested JSON elevation data from Google Maps API into structured DataFrames. Through practical code examples, it demonstrates the complete workflow from API data retrieval to final data processing, including data acquisition, JSON parsing, and data flattening. The article also compares traditional manual parsing methods with the json_normalize approach, helping readers understand best practices for handling complex nested JSON data.
Introduction
In modern data science and geographic information system applications, processing JSON data from various APIs has become a common task. The Google Maps Elevation API provides powerful functionality for obtaining elevation information at specific geographic locations, but its returned data structure typically contains multiple levels of nesting, posing challenges for data processing. This article delves into efficient methods for handling such complex JSON data using the pandas library.
Data Acquisition and Initial Processing
First, we need to retrieve elevation data from the Google Maps Elevation API. The following code demonstrates the complete request process:
from urllib.request import Request, urlopen
import json
import pandas as pd
# Define path coordinates
path_coordinates = '42.974049,-81.205203|42.974298,-81.195755'
# Construct API request
api_url = f'https://maps.googleapis.com/maps/api/elevation/json?locations={path_coordinates}&sensor=false'
request = Request(api_url)
# Send request and get response
response = urlopen(request)
elevation_data = response.read()
# Parse JSON data
parsed_data = json.loads(elevation_data)The API typically returns data in the following structure, containing a results array where each element is an object with elevation, location coordinates, and resolution information:
{
"results": [
{
"elevation": 243.3462677001953,
"location": {
"lat": 42.974049,
"lng": -81.205203
},
"resolution": 19.08790397644043
},
{
"elevation": 244.1318664550781,
"location": {
"lat": 42.974298,
"lng": -81.19575500000001
},
"resolution": 19.08790397644043
}
],
"status": "OK"
}Traditional Manual Parsing Approach
Before the introduction of pandas.json_normalize function, developers typically needed to manually extract and restructure nested data:
# Manual data extraction
latitude_list = []
longitude_list = []
elevation_list = []
for result_item in parsed_data['results']:
latitude_list.append(result_item['location']['lat'])
longitude_list.append(result_item['location']['lng'])
elevation_list.append(result_item['elevation'])
# Create DataFrame
manual_df = pd.DataFrame({
'latitude': latitude_list,
'longitude': longitude_list,
'elevation': elevation_list
})While this approach works, it results in verbose code that is prone to errors, especially when dealing with more complex nested structures.
Optimized Solution Using json_normalize
The json_normalize function introduced in pandas 1.0.1 significantly simplifies the processing of nested JSON data:
# Use json_normalize for direct data flattening
normalized_df = pd.json_normalize(parsed_data['results'])
print(normalized_df)After executing this code, we obtain a well-structured DataFrame:
elevation location.lat location.lng resolution
0 243.346268 42.974049 -81.205203 19.087904
1 244.131866 42.974298 -81.195755 19.087904Data Post-Processing and Optimization
While json_normalize provides a good foundational structure, we may need additional processing to optimize the data format:
# Rename columns for better readability
normalized_df = normalized_df.rename(columns={
'location.lat': 'latitude',
'location.lng': 'longitude'
})
# Select required columns
final_df = normalized_df[['latitude', 'longitude', 'elevation', 'resolution']]
# Set appropriate data types
final_df = final_df.astype({
'latitude': 'float64',
'longitude': 'float64',
'elevation': 'float64',
'resolution': 'float64'
})Advanced Features and Customization Options
json_normalize offers several parameters to handle more complex scenarios:
# Handle multi-level nested structures
complex_normalized = pd.json_normalize(
parsed_data['results'],
meta=['elevation', 'resolution'],
record_path=['location'],
errors='ignore'
)
# Specify custom separator
custom_separator_df = pd.json_normalize(
parsed_data['results'],
sep='_'
)Error Handling and Data Validation
Robust error handling is crucial in practical applications:
try:
if parsed_data.get('status') == 'OK':
normalized_df = pd.json_normalize(parsed_data['results'])
# Validate data completeness
required_columns = ['elevation', 'location.lat', 'location.lng']
missing_columns = [col for col in required_columns if col not in normalized_df.columns]
if not missing_columns:
print("Data conversion successful")
else:
print(f"Missing columns: {missing_columns}")
else:
print(f"API request failed: {parsed_data.get('status')}")
except json.JSONDecodeError as e:
print(f"JSON parsing error: {e}")
except KeyError as e:
print(f"Data key error: {e}")
except Exception as e:
print(f"Other error: {e}")Performance Comparison and Analysis
By comparing the performance of both methods, we can clearly see the advantages of json_normalize:
import time
# Test manual method performance
start_time = time.time()
for _ in range(1000):
# Manual parsing code...
pass
manual_time = time.time() - start_time
# Test json_normalize performance
start_time = time.time()
for _ in range(1000):
pd.json_normalize(parsed_data['results'])
normalize_time = time.time() - start_time
print(f"Manual method time: {manual_time:.4f} seconds")
print(f"json_normalize time: {normalize_time:.4f} seconds")
print(f"Performance improvement: {((manual_time - normalize_time) / manual_time * 100):.1f}%")Extended Practical Application Scenarios
This approach can be extended to handle more complex geospatial data processing tasks:
# Batch process multiple paths
multiple_paths = [
'42.974049,-81.205203|42.974298,-81.195755',
'43.6532,-79.3832|43.6516,-79.3817'
]
all_dataframes = []
for path in multiple_paths:
api_url = f'https://maps.googleapis.com/maps/api/elevation/json?locations={path}&sensor=false'
response = urlopen(Request(api_url))
data = json.loads(response.read())
if data['status'] == 'OK':
df = pd.json_normalize(data['results'])
df['path_id'] = path # Add path identifier
all_dataframes.append(df)
# Combine all data
combined_df = pd.concat(all_dataframes, ignore_index=True)Conclusion and Best Practices
The pandas.json_normalize function provides a powerful and efficient solution for processing nested JSON data. Through the examples and analysis in this article, we can see that this method not only offers concise code but also superior performance, making it particularly suitable for handling complex data structures from various APIs. In practical applications, it's recommended to combine this approach with appropriate data validation and error handling to ensure reliable and stable data processing. For applications requiring processing of large amounts of geospatial data, this method can significantly improve development efficiency and code maintainability.