Keywords: R programming | normal distribution | data visualization | statistical plotting | dnorm function
Abstract: This article provides a comprehensive guide to plotting standard normal distribution graphs in R. Starting with the dnorm() and plot() functions for basic distribution curves, it progressively adds mean labeling, standard deviation markers, axis labels, and titles. The article also compares alternative methods using the curve() function and discusses parameter optimization for enhanced visualizations. Through practical code examples and step-by-step explanations, readers will master the core techniques for creating professional statistical charts.
Introduction and Problem Context
In statistics and data visualization, the normal distribution (also known as Gaussian distribution) is one of the most fundamental and important probability distributions. The standard normal distribution specifically refers to a normal distribution with mean 0 and standard deviation 1, with probability density function: f(x) = (1/√(2π)) * e^(-x²/2). In R, plotting standard normal distribution graphs is not only a common educational exercise but also an essential skill in practical data analysis. Based on high-quality Q&A from Stack Overflow, this article systematically explains how to plot standard normal distribution graphs in R with complete annotations.
Basic Plotting Method
The core of plotting standard normal distribution involves using R's dnorm() function to calculate probability density values, then visualizing them through plotting functions. Here is the basic implementation code:
x <- seq(-4, 4, length=1000)
y <- dnorm(x, mean=0, sd=1)
plot(x, y, type="l", lwd=2, col="blue")
This code first uses the seq() function to generate 1000 equally spaced points from -4 to 4, covering the main region of the standard normal distribution (approximately 99.99% of data falls within ±4 standard deviations). The dnorm() function calculates the probability density for each x value, with parameters mean=0 and sd=1 specifying the standard normal distribution. Finally, the plot() function draws the curve, where type="l" indicates a line plot, lwd=2 sets line width, and col="blue" specifies color.
Adding Annotations and Enhancements
A complete distribution graph requires clear annotations. The following code demonstrates how to add mean lines, standard deviation markers, axis labels, and titles:
# Plot basic curve
x <- seq(-4, 4, length=1000)
y <- dnorm(x, mean=0, sd=1)
plot(x, y, type="l", lwd=2, col="darkgreen",
xlab="Standard Score (z)", ylab="Probability Density",
main="Standard Normal Distribution Plot")
# Add mean vertical line
abline(v=0, col="red", lwd=2, lty=2)
text(0, max(y)*0.9, "Mean (μ=0)", col="red", pos=4)
# Mark ±1, ±2, ±3 standard deviations
sd_positions <- c(-3, -2, -1, 1, 2, 3)
for(pos in sd_positions) {
abline(v=pos, col="gray", lty=3)
text(pos, dnorm(pos)*0.8,
paste("", ifelse(pos>0, "+", ""), pos, "σ", sep=""),
col="darkblue", cex=0.8)
}
# Add legend
legend("topright", legend=c("Normal Curve", "Mean Line", "SD Lines"),
col=c("darkgreen", "red", "gray"),
lty=c(1, 2, 3), lwd=c(2, 2, 1), cex=0.8)
Key points: The abline() function adds reference lines, v=0 indicates a vertical line at x=0; the text() function adds text annotations, where the pos parameter controls text position relative to coordinates; legend() creates a legend explaining graphical elements. By looping through multiple standard deviation positions, the code becomes more concise and efficient.
Alternative Method: Using curve() Function
Besides manually calculating x and y values, R provides the curve() function to directly plot function curves:
curve(dnorm(x, mean=0, sd=1),
from=-4, to=4,
n=1000,
col="purple", lwd=2,
xlab="z-value", ylab="Density",
main="Standard Normal Distribution Using curve()")
The advantage of curve() is its simpler syntax, with from and to parameters directly specifying the x-axis range, and n controlling the number of calculation points. This method is particularly suitable for plotting known mathematical functions, avoiding the step of explicitly generating x sequences.
Advanced Customization and Considerations
In practical applications, further customization may be needed:
# Adjust graphical parameters
par(mar=c(5, 5, 4, 2) + 0.1) # Set margins
# Plot distribution with shaded area
x <- seq(-4, 4, length=1000)
y <- dnorm(x)
plot(x, y, type="l", lwd=3, col="navy",
xlab=expression(paste("Standard Normal Variable ", italic(Z))),
ylab=expression(paste("Probability Density ", italic(f(z)))),
cex.lab=1.2, cex.axis=1.1)
# Add shading within ±1 standard deviation
x_shade <- seq(-1, 1, length=200)
y_shade <- dnorm(x_shade)
polygon(c(-1, x_shade, 1), c(0, y_shade, 0),
col=rgb(0, 0, 1, 0.3), border=NA)
text(0, 0.1, "68.27%", cex=1.1, font=2) # Label probability area
Here, expression() creates mathematical annotations, polygon() adds shaded areas, and rgb() specifies semi-transparent colors. The par() function adjusts graphical parameters to ensure annotations are not truncated. These techniques make statistical charts more professional and readable.
Teaching Suggestions and Common Issues
For beginners, a step-by-step approach is recommended: 1) Plot the basic curve first; 2) Add axes and titles; 3) Annotate key statistics; 4) Enhance the graph visually. Common issues include: setting the x-axis range too narrow resulting in incomplete curves, overlapping annotation text, and insufficient graph margins. These can be resolved by adjusting seq() parameters, using text()'s pos and cex parameters, and properly setting par(mar).
Conclusion
Plotting standard normal distribution graphs in R involves the coordinated use of multiple functions: dnorm() calculates density values, plot() or curve() draws curves, and abline(), text(), and legend() add annotations. After mastering these basics, further functions like polygon() and expression() can be used to create more complex visualizations. Whether for teaching demonstrations or research reports, clear statistical charts effectively communicate data characteristics and statistical concepts.