It just so happens that I created a vectorized SD function the other day. See the column and row versions below. If there are any rows/columns with only one non-NA value, it will return NaN.
col_sd = function(x){ dimx = dim(x) x.mean = .Internal(colMeans(x,dimx[1],dimx[2],na.rm=TRUE)) err = t(t(x)-x.mean) err.sq = err*err sum.err.sq = .Internal(colSums(err.sq,dimx[1],dimx[2],na.rm=TRUE)) n = .Internal(colSums(!is.na(x),dimx[1],dimx[2],na.rm=TRUE)) sqrt(sum.err.sq/(n-1)) } row_sd = function(x){ dimx = dim(x) x.mean = .Internal(rowMeans(x,dimx[1],dimx[2],na.rm=TRUE)) err = x-x.mean err.sq = err*err sum.err.sq = .Internal(rowSums(err.sq,dimx[1],dimx[2],na.rm=TRUE)) n = .Internal(rowSums(!is.na(x),dimx[1],dimx[2],na.rm=TRUE)) sqrt(sum.err.sq/(n-1)) } On Wed, Feb 25, 2009 at 5:11 PM, Jorge Ivan Velez <jorgeivanve...@gmail.com> wrote: > Dear Matt, > You're absolutely right. The reason why my code gives different results is > because it calculates the standard deviation for each row/column in all the > arrays rather than for every cell. My bad. > > You can easily get the results you posted running > >>apply(p,1:2,sd,na.rm=TRUE) > > Here is another option which gives the same results you posted :-) > Unfortunately there is not timing improvement :( (Note that I ran both > sd1 and sd2 functions using pp instead of p) > > # ------------- > # System times > # ------------- >> pp <- array(c(1:5, rep(NA, times = 3)), dim = c(1000, 1000, 60)) >> system.time(apply(pp, 1:2, sd1)) > # user system elapsed > # 93.16 0.23 94.87 >> system.time(apply(pp, 1:2, sd2)) > # user system elapsed > # 66.51 0.30 67.84 > > # ---------- > # Functions > # ---------- > sd1<-function(x) sd(x,na.rm=TRUE) > > sd2<- function(i){ > if(sum(!is.na(i))==0) temp.sd <- NA > else temp.sd <- sd(i, na.rm = TRUE) > temp.sd > } > > > Regards, > > Jorge > > > On Wed, Feb 25, 2009 at 3:23 PM, Matt Oliver <moli...@udel.edu> wrote: > >> Hi Jorge, this does not seem to return the same thing as >> >> p <- array(c(1:5, rep(NA, times = 3)), dim = c(5, 5, 3)) >> sd_fun <- function(i){ >> if(sum(!is.na(i))==0){ >> temp.sd <- NA >> }else{ >> temp.sd <- sd(i, na.rm = TRUE) >> } >> return(temp.sd) >> } >> >> apply(p, 1:2, sd_fun) >> >> Am I missing something basic here? >> >> >> >> On Wed, Feb 25, 2009 at 11:47 AM, Jorge Ivan Velez < >> jorgeivanve...@gmail.com> wrote: >> >>> >>> Hi Mark, >>> There is a typo in the first way. My apologies. It should be: >>> # First >>> res<-apply(p,3,function(X) >>> c(scols=apply(X,2,sd,na.rm=TRUE),srows=apply(X,1,sd,na.rm=TRUE)) >>> ) >>> res >>> >>> HTH, >>> >>> Jorge >>> >>> >>> On Wed, Feb 25, 2009 at 11:42 AM, Jorge Ivan Velez < >>> jorgeivanve...@gmail.com> wrote: >>> >>>> >>>> Dear Matt, >>>> >>>> Here are two ways: >>>> >>>> # Data >>>> p <- array(c(1:5, rep(NA, times = 3)), dim = c(5, 5, 3)) >>>> >>>> # First >>>> res<-apply(p,3,function(X) >>>> c(scols=apply(X,2,sd,na.rm=TRUE),srows=apply(X,2,sd,na.rm=TRUE)) >>>> ) >>>> res >>>> >>>> # Second >>>> res2<-apply(p,3,function(X) >>>> >>>> list(scols=apply(X,2,sd,na.rm=TRUE),srows=apply(X,1,sd,na.rm=TRUE)) >>>> ) >>>> >>>> lapply(res2,function(x) do.call(rbind,x)) >>>> >>>> HTH, >>>> >>>> Jorge >>>> >>>> >>>> On Wed, Feb 25, 2009 at 10:53 AM, Matt Oliver <moli...@udel.edu> wrote: >>>> >>>>> Dear help, suppose I have this array and want to compute sd aross rows >>>>> and >>>>> columns. >>>>> >>>>> p <- array(c(1:5, rep(NA, times = 3)), dim = c(5, 5, 3)) >>>>> >>>>> apply(p, 1:2, sd) fails because sd requires at least 2 numbers to >>>>> compute sd >>>>> >>>>> apply(p, 1:2, sd, na.rm = TRUE) fails for the same reason >>>>> >>>>> I crafted my own function that does what I want >>>>> >>>>> sd_fun <- function(i){ >>>>> if(sum(!is.na(i))==0){ >>>>> temp.sd <- NA >>>>> }else{ >>>>> temp.sd <- sd(i, na.rm = TRUE) >>>>> } >>>>> return(temp.sd) >>>>> } >>>>> >>>>> >>>>> apply(p, 1:2, sd_fun) >>>>> >>>>> This does what I want, but when I scale up to large arrays like >>>>> >>>>> pp <- array(c(1:5, rep(NA, times = 3)), dim = c(1000, 1000, 60)) >>>>> >>>>> the apply function takes a long time to run. >>>>> >>>>> Is there a faster, more efficient way to do this? >>>>> >>>>> Thanks in advance >>>>> >>>>> Matt >>>>> >>>>> >>>>> -- >>>>> Matthew J. Oliver >>>>> Assistant Professor >>>>> College of Marine and Earth Studies >>>>> University of Delaware >>>>> 700 Pilottown Rd. >>>>> Lewes, DE, 19958 >>>>> 302-645-4079 >>>>> http://www.ocean.udel.edu/people/profile.aspx?moliver >>>>> >>>>> [[alternative HTML version deleted]] >>>>> >>>>> ______________________________________________ >>>>> R-help@r-project.org mailing list >>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>> PLEASE do read the posting guide >>>>> http://www.R-project.org/posting-guide.html >>>>> and provide commented, minimal, self-contained, reproducible code. >>>>> >>>> >>>> >>> >> >> >> -- >> Matthew J. Oliver >> Assistant Professor >> College of Marine and Earth Studies >> University of Delaware >> 700 Pilottown Rd. >> Lewes, DE, 19958 >> 302-645-4079 >> http://www.ocean.udel.edu/people/profile.aspx?moliver >> >> > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Mike Lawrence Graduate Student Department of Psychology Dalhousie University Looking to arrange a meeting? Check my public calendar: http://tinyurl.com/mikes-public-calendar ~ Certainty is folly... I think. ~ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.