Better(?): The inequalities can be vectorized and rle() can then by apply()ed on the rows:
(d is your data frame. "data' is a really bad name) out <- d[,3:6] < d[,1] & d[,3:6]>d[,2] a <- apply(as.matrix(out),1, rle) a will be a list each component of which will have the consecutive runs information you need. for a row. You can then easily process this via lapply to get what you want. I leave the details to you. ?rle tells you what you need t know. -- Bert On Sat, Mar 23, 2013 at 4:46 AM, Rui Barradas <[email protected]> wrote: > Hello, > > The following should be faster. It preallocates a vector of length nrows > instead of extending 'a' on every iteration. > > a2 <- character(nrows) > for(b in 1:nrows) > { > c= ColChange(data[b,1],data[b,2],**data[b,3:6],2) > a2[b]=c > } > all.equal(c(a), a2) > > > As for the use of apply, I'm getting errors but the way to call it would > be, after changing ColChange in order to correct some of the errors, > (changed: ncol to NCOL and colnames to names) > > > ColChange2 <- function(LowLim, HighLim, Vals, NumConsecOut) { > cols <- NCOL(Vals) > yr_init = 0 > k = 0 > for (i in 1:cols){ > val = Vals[i] > if (is.na(val)){ > Result="NA" > next > } else { > if (val<LowLim||val>HighLim){ > if (yr_init==0) { > yr_init = names(Vals)[i] > k = k + 1 > } else { > k = k + 1 > } > if (k==NumConsecOut){ > Result=yr_init > break > } > } else { > yr_init=0 > k=0 > } > if (yr_init==0){ > Result = names(Vals)[cols] > } > } > } > return(Result) > } > ##############################**################### > > apply(data, 1, function(x) ColChange2(x[1], x[2], x[3:6], 2) ) > Error in ColChange2(x[1], x[2], x[3:6], 2) : object 'Result' not found > > > So there's some debugging to be done. > Anyway, the revised loop should be much faster. > > > Hope this helps, > > Rui Barradas > > Em 23-03-2013 11:08, Camilo Mora escreveu: > >> Hi everyone, >> >> I wonder if I can get your help using a custom function in apply. >> >> Imagine the following dataframe called "data": >> >> LowLim HighLim A1 A2 A3 A4 >> 4 6 3 4 5 6 >> 4 6 4 5 5 6 >> 2 3 1 4 2 3 >> 2 3 NA NA NA NA >> >> We have created a custom function (see below) that takes the values in a >> given row between columns A1 to A4 to see if they are outside the limits >> in the same row set by columns LowLim and HighLim, if at least x >> consecutive values are outside, the function returns the column name of >> the first column in that series. If no value is outside, the function >> returns the name of the last column and if there are NAs, the function >> returns NA. >> >> So in the example above, the function return the following results: >> >> Considering 2 consecutive values outside the limits: >> A4 >> A4 >> A1 >> NA >> >> Considering 1 value outside the limits: >> A1 >> A4 >> A1 >> NA >> >> The problem we have is that our dataframe has over 1.2 million rows. So >> right now we are using a loop (see below), which work fine by entering >> the values of each row in our function but it will take several days to >> complete. The idea is to see if we can use our function with apply. >> Basically, >> >> data$Results<-apply(data,1, ColChange, LowLim=data[1], HighLim=data[2], >> Vals=data[3:6], NumConsecOut=2) >> >> But we get the following error: âError in FUN(newX[, i], ...) : unused >> argument(s) (newX[, i])â >> >> Any idea about this error or an alternative way to obtain the results we >> look for. >> >> Thank you very much, >> >> Camilo >> >> ################# Our function is: ####################### >> ColChange <- function(LowLim, HighLim, Vals, NumConsecOut) { >> cols <- ncol(Vals) >> yr_init = 0 >> k = 0 >> for (i in 1:cols){ >> val = Vals[i] >> if (is.na(val)){ >> Result="NA" >> next >> } else { >> if (val<LowLim||val>HighLim){ >> if (yr_init==0) { >> yr_init = colnames(Vals)[i] >> k = k + 1 >> } else { >> k = k + 1 >> } >> if (k==NumConsecOut){ >> Result=yr_init >> break >> } >> } else { >> yr_init=0 >> k=0 >> } >> if (yr_init==0){ >> Result=colnames(Vals)[cols] >> } >> } >> } >> return(Result) >> } >> ##############################**################### >> >> >> >> #######Our loop is:###########################**#### >> nrows=nrow(data) >> >> a=c() >> >> for(b in 1:nrows) >> { >> c= ColChange(data[b,1],data[b,2],**data[b,3:6],2) >> a=rbind(a,c) >> } >> ##############################**################### >> >> >> Camilo Mora, Ph.D. >> Department of Geography, University of Hawaii >> Currently available in Colombia >> Phone: Country code: 57 >> Provider code: 313 >> Phone 776 2282 >> From the USA or Canada you have to dial 011 57 313 776 2282 >> http://www.soc.hawaii.edu/**mora/ <http://www.soc.hawaii.edu/mora/> >> >> ______________________________**________________ >> [email protected] mailing list >> https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help> >> PLEASE do read the posting guide >> http://www.R-project.org/**posting-guide.html<http://www.R-project.org/posting-guide.html> >> and provide commented, minimal, self-contained, reproducible code. >> > > ______________________________**________________ > [email protected] mailing list > https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help> > PLEASE do read the posting guide http://www.R-project.org/** > posting-guide.html <http://www.R-project.org/posting-guide.html> > and provide commented, minimal, self-contained, reproducible code. > -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm [[alternative HTML version deleted]]
______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

