Keywords: Python | Pandas | Matplotlib | Data Visualization | Subplots
Abstract: This article provides a comprehensive guide on how to plot multiple pandas DataFrames in subplots within a single figure using Python's Pandas and Matplotlib libraries. Starting from fundamental concepts, it systematically explains key techniques including subplot creation, DataFrame positioning, and axis sharing. Complete code examples demonstrate implementations for both 2×2 and 4×1 layouts. The article also explores how to achieve axis consistency through sharex and sharey parameters, ensuring accurate multi-plot comparisons. Based on high-scoring Stack Overflow answers and official documentation, this guide offers practical, easily understandable solutions for data visualization tasks.
Introduction
In data analysis and visualization, comparing multiple datasets simultaneously is a common requirement. When using Pandas' df.plot() method, separate images are generated for each DataFrame by default, which isn't ideal for comparative analysis. Matplotlib's subplot functionality allows multiple DataFrames to be integrated into a single figure, facilitating direct comparison.
Fundamental Concepts of Subplots
Matplotlib's subplot system enables the creation of multiple axis regions within a single figure, where each region can independently plot data. The plt.subplots() function quickly creates a grid of subplots with specified rows and columns, returning an axes object that is a two-dimensional array accessible via indexing.
Creating a 2×2 Subplot Layout
The following code demonstrates how to create a 2-row by 2-column subplot layout and plot four DataFrames in their respective subplots:
import matplotlib.pyplot as plt
import pandas as pd
# Create sample DataFrames
df1 = pd.DataFrame({'sales': [2, 5, 5, 7, 9, 13, 15, 17, 22, 24],
'returns': [1, 2, 3, 4, 5, 6, 7, 8, 7, 5]})
df2 = pd.DataFrame({'sales': [2, 5, 11, 18, 15, 15, 14, 9, 6, 7],
'returns': [1, 2, 0, 2, 2, 4, 5, 4, 2, 1]})
df3 = pd.DataFrame({'sales': [6, 8, 8, 7, 8, 9, 10, 7, 8, 12],
'returns': [1, 0, 1, 1, 1, 2, 3, 2, 1, 3]})
df4 = pd.DataFrame({'sales': [10, 7, 7, 6, 7, 6, 4, 3, 3, 2],
'returns': [4, 4, 3, 3, 2, 3, 2, 1, 1, 0]})
# Create 2×2 subplot layout
fig, axes = plt.subplots(nrows=2, ncols=2, figsize=(10, 8))
# Plot DataFrames to specified subplots
df1.plot(ax=axes[0, 0])
df2.plot(ax=axes[0, 1])
df3.plot(ax=axes[1, 0])
df4.plot(ax=axes[1, 1])
plt.tight_layout()
plt.show()
In this example, axes[0, 0] corresponds to the top-left subplot, axes[0, 1] to the top-right, and so on. Specifying the plot position via the ax parameter is the crucial step.
Creating a 4×1 Vertical Layout
For comparisons requiring vertical arrangement, a single-column, multi-row subplot layout can be created:
# Create 4×1 subplot layout
fig, axes = plt.subplots(nrows=4, ncols=1, figsize=(8, 12))
# Plot DataFrames
df1.plot(ax=axes[0])
df2.plot(ax=axes[1])
df3.plot(ax=axes[2])
df4.plot(ax=axes[3])
plt.tight_layout()
plt.show()
In this layout, the axes object becomes a one-dimensional array, accessed via axes[0] to axes[3].
Axis Sharing Configuration
When multiple DataFrames share the same value scale, the sharex and sharey parameters ensure consistent axis scales:
# Create 4×1 layout with shared y-axis
fig, axes = plt.subplots(nrows=4, ncols=1, sharey=True, figsize=(8, 12))
df1.plot(ax=axes[0])
df2.plot(ax=axes[1])
df3.plot(ax=axes[2])
df4.plot(ax=axes[3])
plt.tight_layout()
plt.show()
Setting sharey=True causes all subplots to use the same y-axis range, facilitating direct value comparison. Similarly, sharex=True shares the x-axis.
Alternative Methods
Beyond manual subplot creation, Pandas offers built-in subplot functionality. By setting subplots=True and the layout parameter, subplots can be created among multiple columns of a single DataFrame:
# Using Pandas built-in subplot feature
df_multi = pd.concat([df1, df2, df3, df4], axis=1)
df_multi.plot(subplots=True, layout=(2, 2))
This approach is suitable when a single DataFrame contains multiple columns of data, but for independent DataFrames, manually specifying subplot positions offers greater flexibility.
Practical Recommendations
In practical applications, choose the subplot layout based on data characteristics and analysis objectives. For time series comparisons, sharing the x-axis is often helpful; for distribution comparisons, sharing the y-axis may be more important. Using plt.tight_layout() automatically adjusts subplot spacing to prevent label overlap.
Conclusion
Matplotlib's subplot system effectively displays multiple DataFrame datasets within a single figure. Key steps include: creating a subplot grid with plt.subplots(), specifying plot positions via the ax parameter, and utilizing axis sharing as needed. This method significantly enhances the visualization of multi-dataset comparisons and is an essential technique in data analysis and reporting.