Adding Legends to ggplot2 Line Plots: A Best Practice Guide

Nov 20, 2025 · Programming · 10 views · 7.8

Keywords: ggplot2 | legend | data_reshaping | R | visualization

Abstract: This article provides a comprehensive guide on adding legends to ggplot2 line plots when multiple lines are plotted. It emphasizes the best practice of data reshaping using the tidyr package to convert data to long format, which simplifies the plotting code and automatically generates legends. Step-by-step code examples are provided, along with explanations of common pitfalls and alternative approaches. Keywords: ggplot2, legend, data reshaping, R, visualization.

Introduction

In data visualization with R, ggplot2 is a powerful package for creating complex plots. However, beginners often struggle with adding legends when plotting multiple lines. This article addresses this issue by demonstrating the most effective method using data reshaping.

The Problem with Hard-Coded Colors

The original code in the question uses multiple geom_line calls with hard-coded colors, which does not generate a legend because the color aesthetic is not mapped to a variable. For example:

ggplot(data = datos, aes(x = fecha, y = TempMax, colour = "1")) +
  geom_line(colour = "red") +
  geom_line(aes(x = fecha, y = TempMedia, colour = "2"), colour = "green") +
  geom_line(aes(x = fecha, y = TempMin, colour = "2"), colour = "blue") +
  scale_colour_manual(values = c("red", "green", "blue"))

Here, the colour aesthetic is inconsistently mapped, and the manual scale does not correspond correctly to the mapped values.

Solution: Reshaping Data with tidyr

The best practice is to reshape the data into long format using tidyr::pivot_longer. This converts multiple columns into key-value pairs, making it easy to map the color aesthetic to the variable names.

library(ggplot2)
library(tidyr)

datos_long <- pivot_longer(datos, cols = -fecha, names_to = "Temperature")

ggplot(datos_long) +
  geom_line(aes(x = fecha, y = value, colour = Temperature)) +
  scale_colour_manual(values = c("red", "green", "blue"))

This code first reshapes the data so that all temperature values are in one column, with a new column "Temperature" indicating the type. Then, a single geom_line is used with colour = Temperature in the aesthetic, which automatically creates a legend.

Step-by-Step Code Explanation

1. Data Reshaping: pivot_longer(datos, cols = -fecha, names_to = "Temperature") transforms the wide data into long format. The cols = -fecha specifies that all columns except "fecha" should be pivoted. The names_to parameter creates a new column "Temperature" containing the original column names (TempMax, TempMedia, TempMin).

2. Plotting: In ggplot(datos_long), the data is now in a format where one plot can handle all lines. The aes(x = fecha, y = value, colour = Temperature) maps the x-axis to date, y-axis to the temperature values, and color to the temperature type, which ggplot2 uses to generate the legend.

3. Customizing Colors: scale_colour_manual(values = c("red", "green", "blue")) sets the colors for each level of the "Temperature" variable. The order should match the levels in the data.

Advantages of This Method

This approach is cleaner and more scalable. If additional temperature variables are added, only the data reshaping step needs adjustment, not the plotting code. It also avoids the pitfalls of manual color specification in multiple geoms, as seen in the original question and Answer 1.

Common Mistakes and How to Avoid Them

Answer 1 shows an alternative without reshaping, but it requires careful mapping of colors and can be error-prone. For instance, if the colour aesthetic is not consistently mapped, the legend may not appear correctly. The auxiliary article highlights similar issues where hard-coded colors in multiple geom_line calls do not produce a legend unless the aesthetic is properly set.

In the reference article, the code uses color = "blue" outside aes, which does not create a mapping. Correct usage involves placing color specifications inside aes when mapping to a variable.

Conclusion

Using data reshaping with tidyr is the recommended method for adding legends to ggplot2 line plots with multiple lines. It simplifies code, reduces errors, and makes plots more maintainable. By mapping aesthetics to variables, ggplot2 automatically handles legends, allowing for easy customization.

Copyright Notice: All rights in this article are reserved by the operators of DevGex. Reasonable sharing and citation are welcome; any reproduction, excerpting, or re-publication without prior permission is prohibited.