Adding Significance Stars to ggplot Barplots and Boxplots: Automated Annotation Based on p-Values

Keywords: ggplot2 | significance annotation | p-value | barplot | boxplot

Abstract: This article systematically introduces techniques for adding significance star annotations to barplots and boxplots within R's ggplot2 visualization framework. Building on the best-practice answer, it details the complete process of precise annotation through custom coordinate calculations combined with geom_text and geom_line layers, while supplementing with automated solutions from extension packages like ggsignif and ggpubr. The content covers core scenarios including basic annotation, subgroup comparison arc drawing, and inter-group comparison labeling, with reproducible code examples and parameter tuning guidance.

Introduction and Background

In scientific research and data analysis visualization, barplots and boxplots are commonly used chart types for displaying differences between groups. To intuitively convey statistical test results, researchers often need to add star (*) annotations to indicate significance levels (p-values), such as three stars (***) for p < 0.001, two stars (**) for p < 0.01, and one star (*) for p < 0.05. This annotation method effectively enhances chart readability and information density, but how to achieve precise, automated annotation in ggplot2 remains a frequent practical challenge.

Basic Annotation Method: Adding Stars to Single Groups

The most basic annotation scenario involves adding stars above individual groups. Consider the following example data:

dat <- data.frame(Group = c("S1", "S1", "S2", "S2"),
                  Sub   = c("A", "B", "A", "B"),
                  Value = c(3,5,7,8))

First, create a basic barplot:

library(ggplot2)
p <- ggplot(dat, aes(Group, Value)) +
    geom_bar(aes(fill = Sub), stat="identity", position="dodge", width=.5) +
    theme_bw() + theme(panel.grid = element_blank()) +
    coord_cartesian(ylim = c(0, 15)) +
    scale_fill_manual(values = c("grey80", "grey20"))

Add stars by creating an annotation data frame with specified coordinates and using geom_text:

label.df <- data.frame(Group = c("S1", "S2"),
                       Value = c(6, 9))
p + geom_text(data = label.df, label = "***")

The key to this method lies in accurately calculating the Value coordinates in label.df to position them slightly above the top of corresponding bars. For boxplots, the principle is the same, simply replace geom_bar with geom_boxplot.

Subgroup Comparison Annotation: Combining Arcs and Stars

When comparing different subcategories within the same group (e.g., A vs. B in group S1), dashed arcs connecting subcategories with star annotations are often used. Implementing this requires calculating arc coordinates:

# Define semicircle parameters
r <- 0.15
t <- seq(0, 180, by = 1) * pi / 180
x <- r * cos(t)
y <- r*5 * sin(t)
arc.df <- data.frame(Group = x, Value = y)

# Create subgroup star labels
label.df <- data.frame(Group = c(1,1,1, 2,2,2),
                       Value = c(6.5,6.8,7.1, 9.5,9.8,10.1))

# Combine plots
p2 <- p + geom_text(data = label.df, label = "*") +
    geom_line(data = arc.df, aes(Group+1, Value+5.5), lty = 2) +
    geom_line(data = arc.df, aes(Group+2, Value+8.5), lty = 2)

Here, r controls the arc radius, t generates the angle sequence, and x and y compute Cartesian coordinates. By adjusting offsets in Group+1 and Value+5.5, the arc can be precisely positioned above specified subgroups. lty = 2 sets the dashed line style.

Inter-Group Comparison Annotation: Applying Flattened Arcs

For cross-group comparisons (e.g., overall difference between S1 and S2), larger arcs flattened at the top to accommodate stars are typically used:

r <- .5
x <- r * cos(t)
y <- r*4 * sin(t)
y[20:162] <- y[20]  # Flatten the top
arc.df <- data.frame(Group = x, Value = y)

p2 + geom_line(data = arc.df, aes(Group+1.5, Value+11), lty = 2) +
     geom_text(x = 1.5, y = 12, label = "***")

By fixing y-values from 20° to 162° with y[20:162] <- y[20], a flattened effect is achieved. Coordinates (1.5, 12) place the star centrally above the arc.

Automated Extension Solutions: ggsignif and ggpubr

While the manual methods above are flexible, calculations can be complex. Community-developed extension packages offer automated solutions:

The ggsignif package simplifies annotation via the geom_signif layer:

library(ggsignif)
ggplot(iris, aes(x=Species, y=Sepal.Length)) + 
  geom_boxplot() +
  geom_signif(comparisons = list(c("versicolor", "virginica")), 
              map_signif_level=TRUE)

The parameter map_signif_level=TRUE automatically maps star counts based on p-values.

The ggpubr package further extends functionality, supporting multi-group comparisons and complex tests:

library(ggpubr)
my_comparisons = list( c("0.5", "1"), c("1", "2"), c("0.5", "2") )
ggboxplot(ToothGrowth, x = "dose", y = "len",
          color = "dose", palette = "jco")+ 
  stat_compare_means(comparisons = my_comparisons, label.y = c(29, 35, 40))+
  stat_compare_means(label.y = 45)

stat_compare_means can automatically perform t-tests, ANOVA, etc., and annotate results.

Practical Recommendations and Parameter Tuning

1. Coordinate Calculation: For manual annotation, it is advisable to first plot the base chart, inspect coordinate ranges via ggplot_build, then compute annotation positions.

2. Aesthetic Adjustments: Use vjust, hjust for fine-tuning text alignment, size to control star size, and color for coloring.

3. Dynamic Annotation: Write functions to automatically generate star labels and coordinates based on p-value vectors, for example:

map_stars <- function(pvals) {
  sapply(pvals, function(p) {
    if(p < 0.001) return("***")
    else if(p < 0.01) return("**")
    else if(p < 0.05) return("*")
    else return("NS")
  })
}

4. Scalability: For complex experimental designs, combine facet_wrap with annotation logic to batch-process faceted charts.

Conclusion

Adding significance star annotations in ggplot2 can be achieved either through manual implementation with underlying geometric objects for precise control, or rapidly automated via extension packages like ggsignif and ggpubr. The choice depends on balancing flexibility, efficiency, and complexity. The methods introduced in this article cover mainstream scenarios from basic annotation to complex comparisons, with code directly adaptable to real data, providing reliable technical support for scientific visualization.

Copyright Notice: All rights in this article are reserved by the operators of DevGex. Reasonable sharing and citation are welcome; any reproduction, excerpting, or re-publication without prior permission is prohibited.