Hi André another approach using split/lapply
lll <- lapply(split(A$ID, A$Date), function(x) x<9) A$select <- unlist(lapply(lll, function(x) x*sum(x)>=8)) A A[A$select,] However if your real data frame does not have same properties as the one you showed, results could be wrong. e.g. if A$ID has not 8 consecutive values (1:8) but e.g. 1,1,2,2, 3, 3, 4, 4, 5, 5, ... or 1,1,1,1,1,1,1,1, ... Cheers Petr > -----Original Message----- > From: R-help <r-help-boun...@r-project.org> On Behalf Of Jim Lemon > Sent: Monday, June 21, 2021 12:11 PM > To: Eric Berger <ericjber...@gmail.com> > Cc: R mailing list <r-help@r-project.org>; André Luis Neves > <andrl...@ualberta.ca> > Subject: Re: [R] Help with selection of continuous data > > Hi Andre, > I've taken a different approach to that employed by Eric: > > A<- > data.frame(c("01/01/2020","01/01/2020","01/01/2020","01/01/2020","01/01/ > 2020", > > "01/01/2020","01/01/2020","01/01/2020","01/01/2020","01/01/2020","01/01/ > 2020", > > "01/01/2020","01/02/2020","01/02/2020","01/02/2020","01/02/2020","01/03/ > 2020", > > "01/03/2020","01/03/2020","01/03/2020","01/03/2020","01/03/2020","01/03/ > 2020", > "01/04/2020","01/04/2020","01/04/2020","01/04/2020","01/04/2020", > "01/04/2020","01/04/2020","01/04/2020","01/04/2020"), > c(23,22,12,24,26,19,34,15,17,19,23,33,23,34,25,23,25,24,34,33,31,32,24,22,21, > 23,22,22,21,23,23,21), > c(13,11,12,9,8,9,7,10,11,9,6,11,9,8,9,10,11,12,9,8,10,4,6,9,8,9,10,11,14,12, > 13,11), > c(1,2,3,4,5,6,7,8,9,10,11,12,1,2,3,4,1,2, > 3,4,5,6,7,1,2,3,4,5,6,7,8,9)) > colnames(A) <- c("Date", "CO2", "CH4", "ID") # add a variable to compile > selected rows A$select<-FALSE # get all unique dates > alldates<-unique(A$Date) > for(date in alldates) { > # get indices for this date > date_indices<-which(A$Date == date) > # only mark the first 8 as TRUE > A$select[date_indices[1:8]]<-all(1:8 %in% A$ID[date_indices]) } A > A[A$select,] > > If you don't want to add a column you can set up "select" as a vector. > > Jim > > On Mon, Jun 21, 2021 at 6:18 PM Eric Berger <ericjber...@gmail.com> wrote: > > > > Hi André, > > It's not 100% clear to me what you are asking. I am interpreting the > > question as selecting the data from those dates for which all of > > 1,2,3,4,5,6,7,8 appear in the ID column. > > My approach determines the dates satisfying this property, which I put > > into a vector dtV. Then I take the rows of A for which the date is in > > the vector dtV. > > > > library(dplyr) > > dtV <- A %>% mutate(x=2^(ID-1)) %>% group_by(Date) %>% > > summarise(y=(sum(unique(x))%%256==255)) %>% filter(y==TRUE) %>% > > select(Date) B <- A[ A$Date %in% dtV$Date, ] > > > > B is the subset of A that you want. > > > > HTH, > > Eric > > > > > > > > On Mon, Jun 21, 2021 at 10:23 AM André Luis Neves > > <andrl...@ualberta.ca> > > wrote: > > > > > Dear R users, > > > > > > I want to select only the data containing a continuous number of > > > *ID* from > > > 1-8 in each *DATE*. Note, I do not want to select data that do not > > > contain a continuous number in *ID *from 1-8 (eg. Data on *DATE* > > > 1/2/2020, and 01/03/2020). The dataset is a huge matrix with 24 > > > columns and 1.5 million rows, but I have prepared a reproducible code for > your reference below. > > > > > > Here it is the reproducible code: > > > > > > A = > > > > > > data.frame(c("01/01/2020","01/01/2020","01/01/2020","01/01/2020","01 > > > /01/2020","01/01/2020","01/01/2020", > > > > > > > > > > > > "01/01/2020","01/01/2020","01/01/2020","01/01/2020","01/01/2020","01 > > > /02/2020","01/02/2020", > > > > > > > > > > > > "01/02/2020","01/02/2020","01/03/2020","01/03/2020","01/03/2020","01 > > > /03/2020","01/03/2020", > > > > > > > > > > "01/03/2020","01/03/2020","01/04/2020","01/04/2020","01/04/2020","01/04/ > 2020","01/04/2020", > > > "01/04/2020","01/04/2020","01/04/2020","01/04/2020"), > > > c(23,22,12,24,26,19,34,15,17,19,23,33, > > > > > > 23,34,25,23,25,24,34,33,31,32,24,22,21,23,22,22,21,23,23,21), > > > c(13,11,12,9,8,9,7,10,11,9,6,11, > > > 9,8,9,10,11,12,9,8,10,4,6,9,8,9,10,11,14,12,13,11), > > > c(1,2,3,4,5,6,7,8,9,10,11,12,1,2,3,4,1,2, > > > 3,4,5,6,7,1,2,3,4,5,6,7,8,9)) > > > colnames(A) <- c("Date", "CO2", "CH4", "ID") A > > > > > > Thank you, > > > -- > > > Andre > > > > > > [[alternative HTML version deleted]] > > > > > > ______________________________________________ > > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > PLEASE do read the posting guide > > > http://www.R-project.org/posting-guide.html > > > and provide commented, minimal, self-contained, reproducible code. > > > > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code.
______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.