Understanding Marker Size in Matplotlib Scatter Plots: From Points Squared to Visual Perception

Keywords: matplotlib | scatter_plot | marker_size | data_visualization | Python

Abstract: This article provides an in-depth exploration of the s parameter in matplotlib.pyplot.scatter function. By analyzing the definition of points squared units, the relationship between marker area and visual perception, and the impact of different scaling strategies on scatter plot effectiveness, readers will master effective control of scatter plot marker sizes. The article combines code examples to explain the mathematical principles and practical applications of marker sizing, offering professional guidance for data visualization.

Fundamental Concepts of Scatter Plot Marker Size

In matplotlib's scatter plot function, the s parameter controls marker size, measured in points squared (points^2). This design is based on visual perception principles, where human perception of marker size correlates more closely with area than linear dimensions. Understanding this concept is crucial for creating effective scatter plots.

Mathematical Meaning of Points Squared Units

Points squared units originate from the point system in typography, where 1 point equals 1/72 inch. When s=100, it indicates a marker area of 100 points squared. This area-based definition means that visual marker size is proportional to the s value, not to the marker's width or height.

Considering circular markers: the area formula is A = πr². If we double the radius, the area increases fourfold. Therefore, to make a circular marker appear twice as large visually, we need to double the s value, not double the radius.

Scaling Strategies for Marker Sizes

Different scaling strategies produce distinctly different visual effects. Here are three common scaling approaches:

import matplotlib.pyplot as plt

x = [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

# Exponential scaling: area doubles each time
s_exponential = [20 * 2**n for n in range(len(x))]

# Square scaling: area proportional to position squared
s_square = [20 * n**2 for n in range(len(x))]

# Linear scaling: area proportional to position
s_linear = [20 * n for n in range(len(x))]

plt.figure(figsize=(12, 4))
plt.scatter(x, [1]*len(x), s=s_exponential, label='Exponential (s=20×2ⁿ)')
plt.scatter(x, [0]*len(x), s=s_square, label='Square (s=20×n²)')
plt.scatter(x, [-1]*len(x), s=s_linear, label='Linear (s=20×n)')
plt.ylim(-1.5, 1.5)
plt.legend()
plt.show()

Exponential scaling produces rapidly increasing markers, square scaling shows moderate growth, while linear scaling yields the most natural visual progression.

Relationship Between Marker Size and Visual Perception

The human visual system perceives area changes approximately linearly. This means when marker area doubles, we perceive the size as roughly doubling. This principle explains why defining marker size using area rather than linear dimensions is more appropriate for scatter plots.

In practical applications, if visual linear growth is desired, the s values should increase linearly:

# Marker sizes for visually linear growth
x = [0, 2, 4, 6, 8, 10]
y = [0] * len(x)
s_linear_visual = [20 * 2**n for n in range(len(x))]

plt.scatter(x, y, s=s_linear_visual)
plt.show()

Size Behavior Across Different Marker Shapes

Although the s parameter uses points squared units uniformly, different marker shapes exhibit varying behaviors:

For square markers (marker='s'), the s value directly corresponds to marker area
For circular markers (marker='o'), the actual area is π/4 × s
For other marker shapes, the area-to-s ratio may differ

However, all marker areas remain proportional to the s parameter, ensuring consistent visual weight across different markers at the same s value.

Practical Application Guidelines

When selecting marker sizes, consider the following factors:

Data Density: Use smaller markers for high-density data to avoid overlap
Visualization Purpose: Use larger markers to emphasize specific data points
Figure Size: Larger figures can accommodate bigger markers without appearing crowded
Color Combinations: Darker markers may appear larger at the same size

Here's a practical application example:

import numpy as np
import matplotlib.pyplot as plt

# Generate sample data
np.random.seed(42)
x = np.random.randn(100)
y = np.random.randn(100)
values = np.random.rand(100)  # Values for determining marker size

# Set marker sizes based on value magnitude
sizes = 50 + 200 * values  # Base size 50, maximum size 250

plt.figure(figsize=(10, 8))
scatter = plt.scatter(x, y, s=sizes, c=values, cmap='viridis', alpha=0.7)
plt.colorbar(scatter, label='Value Magnitude')
plt.xlabel('X Axis')
plt.ylabel('Y Axis')
plt.title('Scatter Plot with Size-Encoded Values')
plt.show()

Comparison with Line Plot Marker Sizes

The scatter plot's s parameter has a squared relationship with line plot's markersize parameter:

fig, ax = plt.subplots()

# Line plot marker size of 10 points
ax.plot([0], [0], marker="o", markersize=10)

# Scatter plot requires s=100 for equivalent visual size
ax.scatter([1], [0], s=100)

plt.show()

This design ensures marker size consistency across different plotting functions.

Impact of Resolution and Display Size

The final displayed marker size is influenced by figure DPI (dots per inch):

for dpi in [72, 100, 144]:
    fig, ax = plt.subplots(figsize=(4, 3), dpi=dpi)
    ax.scatter([0, 1], [0, 1], s=100)
    ax.set_title(f'DPI = {dpi}')
    plt.show()

At 72 DPI, 1 point equals 1 pixel; at other DPI values, appropriate conversion is necessary.

Conclusion

Matplotlib scatter plot marker sizing is designed based on visual perception principles, using points squared units to define marker area. Understanding this design philosophy helps create more effective and aesthetically pleasing data visualizations. By appropriately selecting scaling strategies and considering practical application scenarios, scatter plots can fully realize their potential in data exploration and analysis.

Copyright Notice: All rights in this article are reserved by the operators of DevGex. Reasonable sharing and citation are welcome; any reproduction, excerpting, or re-publication without prior permission is prohibited.