R - Don't use apply on dataframes / ddply by rows and columns

Advertisemen

df <- data.frame(c(1,2,3),c(4,5,6), c("A","B","C"))
names(df) <- c("x","y","z")

print("Classes Before Apply:")
print(class(df[,1]))
print(class(df[,2]))
print(class(df[,3]))
print("Classes During Apply:")
df.apply <- apply(df, MARGIN=1, function(x) print(class(x)))
print("Classes After Apply:")
print(class(df.apply[1]))
print(class(df.apply[2]))
print(class(df.apply[3]))

print("Classes Before ddply (by row):")
print(class(df[,1]))
print(class(df[,2]))
print(class(df[,3]))
print("Classes During ddply (by row):")
df.apply <- ddply(df, names(df), function(x) print(x))
print("Classes After ddply (by row):")
print(class(df.apply[,1]))
print(class(df.apply[,2]))
print(class(df.apply[,3]))

From this you can see that apply will cast the variables of df to characters, whereas ddply will keep them as is. However, I not found a way to iterate through columns with ddply, so I find the best way is to use summarise_each of dplyr.

The simplest way to demonstrate summarise_each is to:

summarise_each(df, funs(mean))

Which applies the function 'mean' to each column. However, one of the columns is of type 'factor' and produces a warning. We can write a custom function with an if/else to stop this warning and make the output look cleaner:

summarise_fn <- function(x) { if (is.numeric(x)) return(mean(x)) else return(NA) }
summarise_each(df, funs(summarise_fn))


Advertisemen

Disclaimer: Gambar, artikel ataupun video yang ada di web ini terkadang berasal dari berbagai sumber media lain. Hak Cipta sepenuhnya dipegang oleh sumber tersebut. Jika ada masalah terkait hal ini, Anda dapat menghubungi kami disini.

Tidak ada komentar:

Posting Komentar

© Copyright 2017 Game Engine Tutorial