A Comprehensive Guide to Plotting Multiple Groups of Time Series Data Using Pandas and Matplotlib

Keywords: Time Series Analysis | Data Visualization | Pandas Data Processing | Matplotlib Plotting | Temperature Data Analysis

Abstract: This article provides a detailed explanation of how to process time series data containing temperature records from different years using Python's Pandas and Matplotlib libraries and plot them in a single figure for comparison. The article first covers key data preprocessing steps, including datetime parsing and extraction of year and month information, then delves into data grouping and reshaping using groupby and unstack methods, and finally demonstrates how to create clear multi-line plots using Matplotlib. Through complete code examples and step-by-step explanations, readers will master the core techniques for handling irregular time series data and performing visual analysis.

Data Preprocessing and Parsing

When working with time series data, proper data parsing is fundamental to ensuring the accuracy of subsequent analysis. For the given temperature record file containing three columns of date, time, and temperature values, we first need to read and parse the data using Pandas' read_csv function.

import pandas as pd

# Read CSV file and parse date column
df = pd.read_csv('temp.csv', names=['date', 'time', 'value'], parse_dates=['date'])

The key parameter parse_dates=['date'] automatically converts date strings into Pandas' datetime64[ns] type, which is the standard format for handling time series data. Through this conversion, we can leverage Pandas' powerful time series processing capabilities.

Time Feature Extraction

To group data by year and month, we need to extract corresponding year and month information from the date column. Pandas provides a convenient .dt accessor to achieve this functionality.

# Extract year and month information
df['Year'] = df.date.dt.year
df['Month'] = df.date.dt.month

By adding these two new columns, the original data now contains sufficient temporal dimension information, enabling us to perform data aggregation and analysis at different time granularities.

Data Grouping and Reshaping

For time series containing data from multiple years, we need to reorganize the data into a format suitable for plotting. The combination of groupby and unstack methods is an ideal choice for achieving this goal.

# Group by month and year, calculate mean temperature
dfg = df.groupby(['Month', 'Year'])['value'].mean().unstack()

This operation first groups temperature values by month and year, then calculates the mean for each group. Subsequently, the unstack() method converts years from indices to columns, generating a DataFrame with months as row indices and years as column names. This structure is particularly suitable for plotting multiple time series lines.

Data Visualization

When using Matplotlib for data visualization, we can directly call Pandas' plotting interface, which significantly simplifies the plotting process.

import matplotlib.pyplot as plt

# Create figure and set size
ax = dfg.plot(figsize=(9, 7), marker='.', xticks=dfg.index)

# Set axis labels and title
ax.set_xlabel('Month')
ax.set_ylabel('Temperature (°C)')
ax.set_title('Comparison of Temperature Trends Across Different Years')

# Add legend
ax.legend(title='Year', bbox_to_anchor=(1.05, 1), loc='upper left')

# Display figure
plt.tight_layout()
plt.show()

In this visualization example, we used the marker='.' parameter to display markers at each data point, which is particularly useful when there are few data points. Meanwhile, xticks=dfg.index ensures that x-axis ticks exactly correspond to month indices.

Handling Data Sparsity Issues

In practical applications, time series data often suffers from sparsity issues, where some months may lack data records. Pandas' groupby operation automatically handles this situation by filling corresponding positions with NaN values. During plotting, Matplotlib intelligently skips these missing values, ensuring graph continuity.

For more complex data sparsity scenarios, consider using interpolation methods to fill missing values or employing more advanced time series processing techniques such as resampling.

Extended Applications and Best Practices

Beyond basic line plotting, we can further extend application scenarios:

Data Smoothing: For temperature data with significant noise, apply moving averages or Savitzky-Golay filters for smoothing.
Anomaly Detection: Combine statistical methods to identify temperature outliers and highlight them in the plot.
Seasonal Analysis: Decompose time series to separate trend, seasonal, and residual components.

In practical applications, we recommend following these best practices:

Always verify data quality and handle outliers and missing values
Choose appropriate graph types and color schemes to ensure readability
Add proper labels and legends to enhance information communication
Consider using interactive visualization libraries (like Plotly) for deeper data exploration

By mastering these core techniques and methods, researchers and data analysts can effectively process and visualize complex time series data, thereby gaining valuable insights and discoveries.

Copyright Notice: All rights in this article are reserved by the operators of DevGex. Reasonable sharing and citation are welcome; any reproduction, excerpting, or re-publication without prior permission is prohibited.