Keywords: scatter plot | Python | Seaborn | Matplotlib | data visualization
Abstract: This article explores methods to color scatter plots based on column values in Python using pandas, Matplotlib, and Seaborn, inspired by ggplot2's aesthetics. It covers updated Seaborn functions, FacetGrid, and custom Matplotlib implementations, with detailed code examples and comparative analysis.
Introduction
In data visualization with R, the ggplot2 library allows easy specification of aesthetics, such as coloring scatter plots by column values. This article addresses how to achieve similar functionality in Python using pandas, Matplotlib, and Seaborn, providing step-by-step explanations from basic concepts to advanced techniques.
Overview of Solutions
In Python, several methods exist to color scatter plots based on categorical or numerical columns. The primary approaches involve using Seaborn for high-level plotting or Matplotlib for finer control. This section introduces three core methods: Seaborn's relplot, FacetGrid, and custom Matplotlib implementation.
Using Seaborn's relplot
For Seaborn 0.11.0 and above, the relplot function is recommended for relational plots. It simplifies the process by automatically handling hue mapping through the hue parameter.
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
np.random.seed(0)
df = pd.DataFrame({
'Height': np.random.uniform(130, 200, 37),
'Weight': np.random.uniform(30, 100, 37),
'Gender': np.random.choice(['Female', 'Male', 'Non-binary', 'No Response'], 37)
})
sns.relplot(data=df, x='Weight', y='Height', hue='Gender')
plt.show()This code creates a scatter plot colored by the 'Gender' column, similar to ggplot2's aes(color=col3) syntax. The hue parameter automatically assigns colors and adds a legend.
Using Seaborn's FacetGrid
For older versions or specific customizations, FacetGrid can be used to map scatter plots. FacetGrid offers more flexibility, allowing faceted plots and custom mappings.
fg = sns.FacetGrid(data=df, hue='Gender')
fg.map(plt.scatter, 'Weight', 'Height').add_legend()
plt.show()This method specifies the coloring column via the hue parameter and applies Matplotlib's scatter function using map. Finally, add_legend is called to distinguish different categories.
Custom Implementation with Matplotlib
For full control over the plotting process, Matplotlib can be used directly by creating a color mapping function. This approach is suitable for custom color schemes or handling complex data.
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
def color_scatter_by_column(df, x_col, y_col, cat_col):
fig, ax = plt.subplots()
categories = np.unique(df[cat_col])
colors = plt.cm.viridis(np.linspace(0, 1, len(categories)))
colordict = dict(zip(categories, colors))
for cat in categories:
subset = df[df[cat_col] == cat]
ax.scatter(subset[x_col], subset[y_col], color=colordict[cat], label=cat)
ax.legend()
return fig
# Example usage
np.random.seed(250)
df = pd.DataFrame({'Height': np.append(np.random.normal(6, 0.25, 5), np.random.normal(5.4, 0.25, 5)),
'Weight': np.append(np.random.normal(180, 20, 5), np.random.normal(140, 20, 5)),
'Gender': ["Male"]*5 + ["Female"]*5})
fig = color_scatter_by_column(df, 'Height', 'Weight', 'Gender')
plt.show()This function uses np.unique to get unique categories, assigns colors via Matplotlib colormaps, and plots each subset in a loop. It allows complete customization of colors and labels.
Comparison and Best Practices
Seaborn's relplot is the most convenient for quick plots; FacetGrid offers more flexibility for faceting or complex mappings; custom Matplotlib implementation provides maximum control but requires more code. For categorical data, Seaborn is recommended; for numerical gradients, Matplotlib colormaps can be applied. In practice, choose the method based on data characteristics and requirements.
Conclusion
By leveraging Seaborn and Matplotlib, Python users can effectively color scatter plots by column values, bridging the gap from ggplot2 aesthetics to the Python ecosystem. The methods covered in this article range from high-level convenience functions to low-level custom implementations, providing a comprehensive toolkit for data visualization.