In this article, we are going to see how to calculate the Sum by Group in R Programming Language.
Data for Demonstration
# creating data frame
df <- data.frame(Sub = c('Math', 'Math', 'Phy', 'Phy',
'Phy', 'Che', 'Che'),
Marks = c(8, 2, 4, 9, 9, 7, 1),
Add_on = c(3, 1, 9, 4, 7, 8, 2))
# view dataframe
df
Output:
Sub Marks Add_on Math 8 3 Math 2 1 Phy 4 9 Phy 9 4 Phy 9 7 Che 7 8 Che 1 2
Method 1: Using aggregate() method in Base R
aggregate() function is used to get the summary statistics of the data by group. The statistics include mean, min, sum. max etc.
Syntax: aggregate(dataframe$aggregate_column, list(dataframe$group_column), FUN)
where
- dataframe is the input dataframe.
- aggregate_column is the column to be aggregated in the dataframe.
- group_column is the column to be grouped with FUN.
- FUN represents sum/mean/min/ max.
# creating data frame
df <- data.frame(Sub = c('Math', 'Math', 'Phy', 'Phy',
'Phy', 'Che', 'Che'),
Marks = c(8, 2, 4, 9, 9, 7, 1),
Add_on = c(3, 1, 9, 4, 7, 8, 2))
aggregate(df$Marks, list(df$Sub), FUN=sum)
aggregate(df$Add_on, list(df$Sub), FUN=sum)
Output:
Group.1 x Che 8 Math 10 Phy 22 Group.1 x Che 10 Math 4 Phy 20
Method 2: Using dplyr() package
group_by() function followed by summarise() function with an appropriate action to perform.
library(dplyr)
df %>%
group_by(Sub) %>%
summarise_at(vars(Marks),
list(name = sum))
Output:
Sub name Che 8 Math 10 Phy 22
Method 3: Using data.table
data.table package to calculate the sum of points scored by a team.
library(data.table)
# convert data frame to data table
setDT(df)
# find sum of points scored by sub
df[ ,list(sum=sum(Marks)), by=Sub]
Output:
Sub sum Math 10 Phy 22 Che 8