Re: [R] best way to apply a list of functions to a dataset ?

Dennis Murphy Tue, 20 Jul 2010 01:58:08 -0700

Hi:

This might be a little easier (?):


library(datasets)
skewness <- function(x) mean(scale(x)^3)
mean.gt.med <- function(x) mean(x)>median(x)

# ------
# construct the function to apply to each variable in the data frame
f <- function(x) c(mean = mean(x), sd = sd(x), skewness = skewness(x),
     median = median(x), mean.gt.med = mean.gt.med(x))

# map function to each variable with lapply and combine with do.call():
do.call(rbind, lapply(attitude, f))
               mean        sd    skewness median mean.gt.med
rating     64.63333 12.172562 -0.35792491   65.5           0
complaints 66.60000 13.314757 -0.21541749   65.0           1
privileges 53.13333 12.235430  0.37912287   51.5           1
learning   56.36667 11.737013 -0.05403354   56.5           0
raises     64.63333 10.397226  0.19754317   63.5           1
critical   74.76667  9.894908 -0.86577893   77.5           0
advance    42.93333 10.288706  0.85039799   41.0           1

HTH,
Dennis


On Mon, Jul 19, 2010 at 10:51 PM, Glen Barnett <glnbr...@gmail.com> wrote:

> Assuming I have a matrix of data (or under some restrictions that will
> become obvious, possibly a data frame), I want to be able to apply a
> list of functions (initially producing a single number from a vector)
> to the data and produce a data frame (for compact output) with column
> 1 being the function results for the first function, column 2 being
> the results for the second function and so on - with each row being
> the columns of the original data.
>
> The obvious application of this is to produce summaries of data sets
> (a bit like summary() does on numeric matrices), but with user
> supplied functions. I am content for the moment to leave it to the
> user to supply functions that work with the data they supply so as to
> produce results that will actually be data-frame-able, though I'd like
> to ultimately make it a bit nicer than it currently is without
> compromising the niceness of the output in the "good" cases.
>
> The example below is a simplistic approach to this problem (it should
> run as is). I have named it "fapply" for fairly obvious reasons, but
> added the ".1" because it doesn't accept multidimensional arrays. I
> have included the output I generated, which is what I want. There are
> some obvious generalizations (e.g. being able to include functions
> like range(), say, that produce several values on a vector, rather
> than one, making the user's life simpler when a function already does
> most of what they need).
>
> The question is: this looks like a silly approach, growing a list
> inside a for loop. Also I recall reading that if you find yourself
> using "do.call" you should probably be doing something else.
>
> So my question: Is there a better way to implement a function like this?
>
> Or, even better, is there already a function that does this?
>
> ## example function and code to apply a list of functions to a matrix
> (here a numeric data frame)
>
> library(datasets)
>
> fapply.1 <- function(x, fun.l, colnames=fun.l){
> out.l <- list()   # starts with an empty list
> for (i in seq_along(fun.l)) out.l[[i]] <- apply(x,2,fun.l[[i]])   #
> loop through list of functions
>
> # set up names and make into a data frame
> names(out.l) <- colnames
> attr(out.l,"row.names") <- names(out.l[[1]])
> attr(out.l,"class") <- "data.frame"
> out.l
> }
>
> skewness <- function(x) mean(scale(x)^3)      #define a simple numeric
> function
> mean.gt.med <- function(x) mean(x)>median(x)  # define a simple non-numeric
> fn
> flist <- c("mean","sd","skewness","median","mean.gt.med") # make list
> of fns to apply
>
> fapply.1(attitude,flist)
>               mean        sd    skewness median mean.gt.med
> rating     64.63333 12.172562 -0.35792491   65.5       FALSE
> complaints 66.60000 13.314757 -0.21541749   65.0        TRUE
> privileges 53.13333 12.235430  0.37912287   51.5        TRUE
> learning   56.36667 11.737013 -0.05403354   56.5       FALSE
> raises     64.63333 10.397226  0.19754317   63.5        TRUE
> critical   74.76667  9.894908 -0.86577893   77.5       FALSE
> advance    42.93333 10.288706  0.85039799   41.0        TRUE
>
> ## end code and output
>
> So did I miss something obvious?
>
> Any suggestions as far as style or simple stability-enhancing
> improvements would be handy.
>
> regards,
> Glen
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] best way to apply a list of functions to a dataset ?

Reply via email to