Keywords: matplotlib | scatter_plot | marker_size | data_visualization | Python
Abstract: This article provides an in-depth exploration of the s parameter in matplotlib.pyplot.scatter function. By analyzing the definition of points squared units, the relationship between marker area and visual perception, and the impact of different scaling strategies on scatter plot effectiveness, readers will master effective control of scatter plot marker sizes. The article combines code examples to explain the mathematical principles and practical applications of marker sizing, offering professional guidance for data visualization.
Fundamental Concepts of Scatter Plot Marker Size
In matplotlib's scatter plot function, the s parameter controls marker size, measured in points squared (points^2). This design is based on visual perception principles, where human perception of marker size correlates more closely with area than linear dimensions. Understanding this concept is crucial for creating effective scatter plots.
Mathematical Meaning of Points Squared Units
Points squared units originate from the point system in typography, where 1 point equals 1/72 inch. When s=100, it indicates a marker area of 100 points squared. This area-based definition means that visual marker size is proportional to the s value, not to the marker's width or height.
Considering circular markers: the area formula is A = πr². If we double the radius, the area increases fourfold. Therefore, to make a circular marker appear twice as large visually, we need to double the s value, not double the radius.
Scaling Strategies for Marker Sizes
Different scaling strategies produce distinctly different visual effects. Here are three common scaling approaches:
import matplotlib.pyplot as plt
x = [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
# Exponential scaling: area doubles each time
s_exponential = [20 * 2**n for n in range(len(x))]
# Square scaling: area proportional to position squared
s_square = [20 * n**2 for n in range(len(x))]
# Linear scaling: area proportional to position
s_linear = [20 * n for n in range(len(x))]
plt.figure(figsize=(12, 4))
plt.scatter(x, [1]*len(x), s=s_exponential, label='Exponential (s=20×2ⁿ)')
plt.scatter(x, [0]*len(x), s=s_square, label='Square (s=20×n²)')
plt.scatter(x, [-1]*len(x), s=s_linear, label='Linear (s=20×n)')
plt.ylim(-1.5, 1.5)
plt.legend()
plt.show()
Exponential scaling produces rapidly increasing markers, square scaling shows moderate growth, while linear scaling yields the most natural visual progression.
Relationship Between Marker Size and Visual Perception
The human visual system perceives area changes approximately linearly. This means when marker area doubles, we perceive the size as roughly doubling. This principle explains why defining marker size using area rather than linear dimensions is more appropriate for scatter plots.
In practical applications, if visual linear growth is desired, the s values should increase linearly:
# Marker sizes for visually linear growth
x = [0, 2, 4, 6, 8, 10]
y = [0] * len(x)
s_linear_visual = [20 * 2**n for n in range(len(x))]
plt.scatter(x, y, s=s_linear_visual)
plt.show()
Size Behavior Across Different Marker Shapes
Although the s parameter uses points squared units uniformly, different marker shapes exhibit varying behaviors:
- For square markers (
marker='s'), thesvalue directly corresponds to marker area - For circular markers (
marker='o'), the actual area isπ/4 × s - For other marker shapes, the area-to-
sratio may differ
However, all marker areas remain proportional to the s parameter, ensuring consistent visual weight across different markers at the same s value.
Practical Application Guidelines
When selecting marker sizes, consider the following factors:
- Data Density: Use smaller markers for high-density data to avoid overlap
- Visualization Purpose: Use larger markers to emphasize specific data points
- Figure Size: Larger figures can accommodate bigger markers without appearing crowded
- Color Combinations: Darker markers may appear larger at the same size
Here's a practical application example:
import numpy as np
import matplotlib.pyplot as plt
# Generate sample data
np.random.seed(42)
x = np.random.randn(100)
y = np.random.randn(100)
values = np.random.rand(100) # Values for determining marker size
# Set marker sizes based on value magnitude
sizes = 50 + 200 * values # Base size 50, maximum size 250
plt.figure(figsize=(10, 8))
scatter = plt.scatter(x, y, s=sizes, c=values, cmap='viridis', alpha=0.7)
plt.colorbar(scatter, label='Value Magnitude')
plt.xlabel('X Axis')
plt.ylabel('Y Axis')
plt.title('Scatter Plot with Size-Encoded Values')
plt.show()
Comparison with Line Plot Marker Sizes
The scatter plot's s parameter has a squared relationship with line plot's markersize parameter:
fig, ax = plt.subplots()
# Line plot marker size of 10 points
ax.plot([0], [0], marker="o", markersize=10)
# Scatter plot requires s=100 for equivalent visual size
ax.scatter([1], [0], s=100)
plt.show()
This design ensures marker size consistency across different plotting functions.
Impact of Resolution and Display Size
The final displayed marker size is influenced by figure DPI (dots per inch):
for dpi in [72, 100, 144]:
fig, ax = plt.subplots(figsize=(4, 3), dpi=dpi)
ax.scatter([0, 1], [0, 1], s=100)
ax.set_title(f'DPI = {dpi}')
plt.show()
At 72 DPI, 1 point equals 1 pixel; at other DPI values, appropriate conversion is necessary.
Conclusion
Matplotlib scatter plot marker sizing is designed based on visual perception principles, using points squared units to define marker area. Understanding this design philosophy helps create more effective and aesthetically pleasing data visualizations. By appropriately selecting scaling strategies and considering practical application scenarios, scatter plots can fully realize their potential in data exploration and analysis.