Adding Labels to Scatter Plots in ggplot2: Comparative Analysis of geom_text and ggrepel

Nov 14, 2025 · Programming · 13 views · 7.8

Keywords: ggplot2 | Data Visualization | Label Addition | Scatter Plot | R Language

Abstract: This article provides a comprehensive exploration of various methods for adding data point labels to scatter plots using R's ggplot2 package. Through analysis of NBA player data visualization cases, it systematically compares the advantages and limitations of basic geom_text functions versus the specialized ggrepel package in label handling. The paper delves into key technical aspects including label position adjustment, overlap management, conditional label display, and offers complete code implementations along with best practice recommendations.

Introduction

Data visualization is an indispensable component of modern data analysis, with scatter plots serving as a classic chart type for displaying relationships between two variables, widely used in both scientific research and business analytics. However, when adding identification labels to each data point in scatter plots, technical challenges such as label overlap and layout confusion often arise. This paper systematically examines solutions for label addition in the ggplot2 package based on real NBA player data.

Data Preparation and Basic Visualization

First, we load the required NBA player dataset, which contains various technical statistics for NBA players from the 2008 season:

library(ggplot2)
nba <- read.csv("http://datasets.flowingdata.com/ppg2008.csv", sep = ",")

Create a basic scatter plot comparing player minutes played (MIN) versus points scored (PTS):

base_plot <- ggplot(nba, aes(x = MIN, y = PTS)) + 
  geom_point(color = "green", size = 2)
print(base_plot)

Basic Label Addition: geom_text Function

The ggplot2 package provides the geom_text() function to add text labels to data points. This function requires specifying the label aesthetic mapping to determine label content:

labeled_plot <- base_plot + 
  geom_text(aes(label = Name), hjust = 0, vjust = 0, size = 3)
print(labeled_plot)

Here, the hjust and vjust parameters control the horizontal and vertical position offsets of labels respectively. A value of 0 indicates left/bottom alignment, 1 indicates right/top alignment, and 0.5 indicates center alignment.

Conditional Label Display Strategy

When dealing with numerous data points, displaying all labels can cause severe visual clutter. In such cases, a conditional labeling strategy can be employed, showing labels only for data points meeting specific criteria:

conditional_labels <- ggplot(nba, aes(x = MIN, y = PTS)) + 
  geom_point(color = "blue") + 
  geom_text(aes(label = ifelse(PTS > 24, as.character(Name), '')), 
            hjust = 0, vjust = 0, size = 3)
print(conditional_labels)

This strategy uses the ifelse() function for conditional filtering, displaying name labels only for players scoring more than 24 points, effectively reducing visual noise.

Advanced Label Handling: ggrepel Package

For complex label layout problems, the ggrepel package provides more professional solutions. This package automatically adjusts label positions using intelligent algorithms to avoid overlaps:

library(ggrepel)
repel_plot <- ggplot(nba, aes(x = MIN, y = PTS)) + 
  geom_point(color = "red", size = 2) + 
  geom_label_repel(aes(label = Name), 
                   box.padding = 0.35, 
                   point.padding = 0.5,
                   segment.color = 'grey50')
print(repel_plot)

The geom_label_repel() function adds background boxes around labels, while geom_text_repel() provides text labels without background boxes. Key parameters include:

Label Optimization in Complex Scenarios

In practical applications, differentiated labeling strategies are often required based on varying data characteristics:

advanced_repel <- ggplot(nba, aes(x = MIN, y = PTS, label = Name)) + 
  geom_point(aes(color = ifelse(PTS > 25, "high", 
                               ifelse(PTS < 18, "low", "medium"))), 
             size = 3, alpha = 0.8) + 
  geom_text_repel(data = subset(nba, PTS > 25),
                  nudge_y = 32 - subset(nba, PTS > 25)$PTS,
                  size = 4,
                  direction = "x") + 
  geom_label_repel(data = subset(nba, PTS < 18),
                   nudge_y = 16 - subset(nba, PTS < 18)$PTS,
                   size = 4,
                   direction = "x")
print(advanced_repel)

This hierarchical labeling strategy combines color coding, conditional filtering, and position adjustment to provide differentiated visual presentation for different categories of data points.

Performance Considerations and Best Practices

When selecting label addition methods, considerations should include data scale, visualization objectives, and computational efficiency:

Additionally, visual attributes such as font size, color contrast, and background transparency need optimization based on specific scenarios.

Conclusion

ggplot2 provides a complete solution spectrum from basic to advanced for adding labels to scatter plots. The geom_text function suits simple labeling needs, while the ggrepel package offers professional automatic layout capabilities for complex scenarios. In practical applications, appropriate methods should be selected based on data characteristics and visualization goals, combined with strategies like conditional display and hierarchical processing to optimize final visualization outcomes.

Copyright Notice: All rights in this article are reserved by the operators of DevGex. Reasonable sharing and citation are welcome; any reproduction, excerpting, or re-publication without prior permission is prohibited.