df <- data.frame(c(1,2,3),c(4,5,6), c("A","B","C"))
names(df) <- c("x","y","z")
print("Classes Before Apply:")
print(class(df[,1]))
print(class(df[,2]))
print(class(df[,3]))
print("Classes During Apply:")
df.apply <- apply(df, MARGIN=1, function(x) print(class(x)))
print("Classes After Apply:")
print(class(df.apply[1]))
print(class(df.apply[2]))
print(class(df.apply[3]))
print("Classes Before ddply (by row):")
print(class(df[,1]))
print(class(df[,2]))
print(class(df[,3]))
print("Classes During ddply (by row):")
df.apply <- ddply(df, names(df), function(x) print(x))
print("Classes After ddply (by row):")
print(class(df.apply[,1]))
print(class(df.apply[,2]))
print(class(df.apply[,3]))
From this you can see that apply will cast the variables of df to characters, whereas ddply will keep them as is. However, I not found a way to iterate through columns with ddply, so I find the best way is to use summarise_each of dplyr.
The simplest way to demonstrate summarise_each is to:
summarise_each(df, funs(mean))
Which applies the function 'mean' to each column. However, one of the columns is of type 'factor' and produces a warning. We can write a custom function with an if/else to stop this warning and make the output look cleaner:
summarise_fn <- function(x) { if (is.numeric(x)) return(mean(x)) else return(NA) }
summarise_each(df, funs(summarise_fn))
R - Don't use apply on dataframes / ddply by rows and columns
Advertisemen
Advertisemen
Tidak ada komentar:
Posting Komentar