A Comprehensive Guide to Creating Percentage Stacked Bar Charts with ggplot2

Dec 08, 2025 · Programming · 11 views · 7.8

Keywords: ggplot2 | percentage stacked bar chart | data visualization

Abstract: This article provides a detailed methodology for creating percentage stacked bar charts using the ggplot2 package in R. By transforming data from wide to long format and utilizing the position_fill parameter for stack normalization, each bar's height sums to 100%. The content includes complete data processing workflows, code examples, and visualization explanations, suitable for researchers and developers in data analysis and visualization fields.

Data Preparation and Format Transformation

Before creating percentage stacked bar charts, it is essential to understand ggplot2's data format requirements. Raw data is typically stored in wide format, where each variable occupies a separate column. However, ggplot2's layer system works better with long format data, where each observation occupies one row, with variables and values stored in two columns respectively.

The following example demonstrates the data transformation process:

library(dplyr)
library(tidyr)

dat <- read.table(text = "    ONE TWO THREE
1   23  234 324
2   34  534 12
3   56  324 124
4   34  234 124
5   123 534 654", sep = "", header = TRUE)

datm <- dat %>% 
  mutate(ind = factor(row_number())) %>%  
  gather(variable, value, -ind)

This code adds a row identifier via mutate(), then uses the gather() function to combine multiple columns into key-value pairs. The transformed dataframe contains three columns: ind (row identifier), variable (original column names), and value (corresponding numerical values).

Implementation Principle of Percentage Stacking

ggplot2 controls geometric object positioning through the position parameter. For bar charts, the position_fill() function normalizes values within each group, making the stacked height sum to 100%. This is achieved by calculating each value's proportion within the group total.

The core plotting code is as follows:

library(ggplot2)

ggplot(datm, aes(x = variable, y = value, fill = ind)) + 
    geom_bar(position = "fill", stat = "identity") + 
    scale_y_continuous(labels = scales::percent_format())

Here, aes() mapping specifies: x-axis as variable names, y-axis as numerical values, and fill colors differentiated by row identifiers. position = "fill" is equivalent to position = position_fill(), with the latter allowing fine-tuning through vjust and reverse parameters.

Visualization Effects and Customization

In the generated chart, each bar represents a column from the original data, with stacked portions corresponding to rows from the original data. After normalization, all bars have consistent heights, facilitating comparison of composition proportions across different variables.

Percentage labels are automatically added via scales::percent_format(), displaying two decimal places by default. This can be adjusted with parameters:

scale_y_continuous(labels = scales::percent_format(accuracy = 1))

Color schemes can be customized using scale_fill_brewer() or scale_fill_manual() to meet various publication requirements or visual preferences.

Advanced Applications and Considerations

For large datasets, it is recommended to use data.table or dtplyr during data transformation to improve performance. When handling missing values, position_fill() ignores NAs by default, but this can be controlled via the na.rm parameter.

When data contains negative values, percentage stacking may be misleading because normalization is based on absolute value sums. In such cases, consider using faceted or grouped bar charts as alternatives.

Compared to other visualization packages, ggplot2's advantages lie in its consistent syntax and powerful extensibility. The theme() function allows fine-grained control over chart appearance, meeting the needs of academic publications or business reports.

Copyright Notice: All rights in this article are reserved by the operators of DevGex. Reasonable sharing and citation are welcome; any reproduction, excerpting, or re-publication without prior permission is prohibited.