Using dplyr package for data manipulation in R: dplyr is a popular R package for data manipulation, used by data scientists and statisticians to clean, manipulate and analyze data. Here are the basics of how to use dplyr:
- Load the package: To use dplyr, you first need to install and load it using the following code:
install.packages("dplyr")
library(dplyr)
- Load data: Next, you need to load your data into R. You can use the
read.csvfunction to read a CSV file or thetibblefunction to create a new data frame.
my_data <- read.csv("my_data.csv")
- Manipulate data: Once your data is loaded, you can use dplyr to manipulate it in various ways. Some common operations include:
- Selecting columns: Use the
selectfunction to select specific columns from your data frame.
select(my_data, col1, col2)
- Filtering rows: Use the
filterfunction to select rows that meet certain criteria.
filter(my_data, col1 > 5)
- Sorting rows: Use the
arrangefunction to sort your data frame by one or more columns.
arrange(my_data, desc(col1))
- Grouping and summarizing: Use the
group_byandsummarizefunctions to group your data by one or more columns and calculate summary statistics.
group_by(my_data, col1) %>% summarize(mean = mean(col2))
- Chaining operations: One of the powerful features of dplyr is the ability to chain operations together using the pipe operator
%>%. This allows you to write concise and readable code for complex data manipulations.
my_data %>%
select(col1, col2) %>%
filter(col1 > 5) %>%
arrange(desc(col1)) %>%
group_by(col1) %>%
summarize(mean = mean(col2))
These are the basics of using dplyr for data manipulation. There are many other functions and options available, but these should get you started in your data exploration and analysis. To understand more check out the pdf given below:

Comments are closed.