> -----Original Message----- > From: r-help-boun...@r-project.org > [mailto:r-help-boun...@r-project.org] On Behalf Of Peter Dalgaard > Sent: Wednesday, May 20, 2009 8:16 AM > To: De France Henri > Cc: r-help@r-project.org > Subject: Re: [R] problem with "APPLY" > > De France Henri wrote: > > Hello, > > > > The "apply" function seems to behave oddly with my code below > > > > NB : H1 is a data frame. (data in the attached file.) > > # the first lines are: > > 1 02/01/2008 0.000000 0 0 0.000000 0 > > 2 03/01/2008 0.000000 0 0 0.000000 0 > > 3 04/01/2008 0.000000 0 0 0.000000 0 > > 4 07/01/2008 0.000000 0 0 0.000000 0 > > 5 08/01/2008 0.000000 0 0 0.000000 0 > > 6 09/01/2008 0.000000 0 0 0.000000 0 > > 7 10/01/2008 0.000000 0 0 0.000000 0 > > 8 11/01/2008 1.010391 0 0 1.102169 0 > > ... > > The aim of the code is to extract those lines for which > there is a strictly positive value in the second column AND > in one of the others: > > > > reper=function(x){as.numeric(x[2]>1 & any(x[3:length(x)]>1))} > > > > TAB1= H1[which(apply(H1,1,reper)>0),] > > > > Strangely, this is OK for all the lines, except for the > last one. In fact, in H1, the last 2 lines are: > > 258 29/12/2008 1.476535 1.187615 0 0.000000 0 > > 259 30/12/2008 0.000000 1.147888 0 0.000000 0 > > Obviously, line 258 should be the last line of TAB1, but it > is not the case (it does not appear at all) and I really > don't understand why. This is all the more strange since > applying the function "reper" only to this line 258 gives a > "1" as expected... > > Can someone help ? > > > > Works for me... > > do...1. V3 V5 V7 V13 V31 > 213 24/10/2008 2.038218 2.820196 0 0.000000 0 > 214 27/10/2008 3.356057 2.588509 0 2.101651 0 > 219 03/11/2008 2.122751 1.648410 0 2.180908 0 > 233 21/11/2008 1.439861 1.883605 0 1.359372 0 > 234 24/11/2008 1.216548 1.480797 0 1.049390 0 > 258 29/12/2008 1.476535 1.187615 0 0.000000 0 > > You are crossing the creek to fetch water, though: > > reper <- function(x) x[2]>1 & any(x[3:length(x)]>1) > TAB1 <- H1[apply(H1,1,reper),] > > or even > > TAB1 <- H1[ H1[2] > 1 & apply(H1[3:6] > 1, 1, any),] > > > -- > O__ ---- Peter Dalgaard Ă˜ster Farimagsgade 5, Entr.B > c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K > (*) \(*) -- University of Copenhagen Denmark Ph: > (+45) 35327918 > ~~~~~~~~~~ - (p.dalga...@biostat.ku.dk) FAX: > (+45) 35327907
I couldn't reproduce the bad result either. However, it was more or less by chance that the results were as good as they were. The call apply(myDataFrame, 1, FUN) does essentially the equivalent of myMatrix <- as.matrix(myDataFrame) for(i in seq_len(nrow(myMatrix))) rowResult[i] <- FUN(myMatrix[i,,drop=TRUE]) If myDataFrame contains any factor, character, POSIXt, or any other non-numeric columns then myMatrix will be a matrix of character strings. Each column of myDataFrame is passed though format() to make those strings, so the precise formatting of the strings depends on all the other elements of the column (E.g., one big or small number might cause the whole column to be formatted in "scientific" notation). Your reper() function happened to work because "2.3" > 0 is interpreted as (I think) "2.3" > "0" which is TRUE (at least in ASCII). However, if your cutoff were 0.000002 then you might be surprised > "2.3">0.000002 [1] FALSE because as.character(0.000002) is "2e-06". I think that using apply(MARGIN=1,...) to data.frames is generally a bad idea and it only really works if all the columns are the same simple type. Avoiding it altgether makes for tedious coding like H1[ H1[2] > 1 & (H1[,3]>1 | H1[,4]>1 | H1[,5]>1 | H1[,6]>1) ,] You can also use pmax (parallel max), as in, H1[H1[2]>1 & do.call("pmax", unname(as.list(H1[,3:6])))>1, ] Peter's 2nd solution calls apply(MARGIN=1,...) only on the numeric part of the data.frame so it works as expected. Bill Dunlap TIBCO Software Inc - Spotfire Division wdunlap tibco.com ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.