Creating a normal distribution plot using ggplot2 in R

Creating a normal distribution plot using ggplot2 in R: The normal distribution is a probability distribution that is often used to model real-world phenomena, such as the distribution of test scores or the heights of a population. It is a bell-shaped curve that is symmetric around its mean value, and its standard deviation determines its spread. In this article, we will walk through the steps of creating a normal distribution plot using the ggplot2 package in R.

Creating a normal distribution plot using ggplot2 in R
Creating a normal distribution plot using ggplot2 in R

Step 1: Generate a dataset

To create a normal distribution plot, we first need to generate a dataset that follows a normal distribution. We can use the rnorm function in R to generate a random sample of numbers that follow a normal distribution with a specified mean and standard deviation. For example, let’s generate a sample of 1000 numbers with a mean of 50 and a standard deviation of 10:

set.seed(123)  # for reproducibility
data <- data.frame(x = rnorm(1000, mean = 50, sd = 10))

This will create a data frame with one column, “x”, that contains our randomly generated numbers.

Step 2: Create a histogram

Next, we can create a histogram of our data using the ggplot2 package. A histogram is a graphical representation of the distribution of a dataset, and it can help us visualize the shape of our normal distribution.

library(ggplot2)
ggplot(data, aes(x = x)) +
  geom_histogram(binwidth = 1, color = "black", fill = "white") +
  labs(x = "Values", y = "Frequency", title = "Histogram of Normal Distribution")

This code will create a histogram with a binwidth of 1, a black border, and white fill. The x-axis will be labeled “Values”, the y-axis will be labeled “Frequency”, and the title of the plot will be “Histogram of Normal Distribution”.

Step 3: Add a density curve

To make our plot more informative, we can add a density curve to show the shape of the normal distribution. A density curve is a smoothed version of the histogram that shows the distribution of our data more clearly.

ggplot(data, aes(x = x)) +
  geom_histogram(binwidth = 1, color = "black", fill = "white") +
  geom_density(color = "blue", size = 1) +
  labs(x = "Values", y = "Density", title = "Histogram and Density Curve of Normal Distribution")

This code will add a blue density curve to our histogram with a size of 1. The x-axis will be labeled “Values”, the y-axis will be labeled “Density”, and the title of the plot will be “Histogram and Density Curve of Normal Distribution”.

Step 4: Customize the plot

Finally, we can customize our plot by adding axis labels, changing the colors and fonts, and adjusting the layout.

ggplot(data, aes(x = x)) +
  geom_histogram(binwidth = 1, color = "black", fill = "#69b3a2") +
  geom_density(color = "#e9c46a", size = 1) +
  labs(x = "Values", y = "Density", title = "Normal Distribution Plot") +
  theme_minimal() +
  theme(plot.title = element_text(size = 18, face = "bold"),
        axis.title = element_text(size = 14, face = "bold"),
        axis.text = element_text(size = 12),
        legend.position = "none")

This code will change the fill color of the histogram to “#69b3a2” and the color of the density curve to “#e9c46

Comments are closed.