Maybe: with(df, tapply(y, x, count))
On Fri, Jan 16, 2009 at 8:10 AM, Simon Pickett <simon.pick...@bto.org>wrote: > Hi all, > > I want to calculate the number of unique observations of "y" in each level > of "x" from my data frame "df". > > this does the job but it is very slow for this big data frame (159503 rows, > 11 columns)..... > > group.list <- split(df$y,df$x) > count <- function(x) length(unique(na.omit(x))) > sapply(group.list, count, USE.NAMES=TRUE) > > I couldnt find the answer searching for "slow split" and "split time" on > help forum. > > I am running R version 2.2.1, on a machine with 4gb of memory and I'm using > windows 2000. > > thanks in advance, > > Simon. > > > > > > > > ----- Original Message ----- From: "Wacek Kusnierczyk" < > waclaw.marcin.kusnierc...@idi.ntnu.no> > To: "Gundala Viswanath" <gunda...@gmail.com> > Cc: "R help" <r-h...@stat.math.ethz.ch> > Sent: Friday, January 16, 2009 9:30 AM > Subject: Re: [R] Value Lookup from File without Slurping > > > you might try to iteratively read a limited number of line of lines in a >> batch using readLines: >> >> # filename, the name of your file >> # n, the maximal count of lines to read in a batch >> connection = file(filename, open="rt") >> while (length(lines <- readLines(con=connection, n=n))) { >> # do your stuff here >> } >> close(connection) >> >> ?file >> ?readLines >> >> vQ >> >> >> Gundala Viswanath wrote: >> >>> Dear all, >>> >>> I have a repository file (let's call it repo.txt) >>> that contain two columns like this: >>> >>> # tag value >>> AAA 0.2 >>> AAT 0.3 >>> AAC 0.02 >>> AAG 0.02 >>> ATA 0.3 >>> ATT 0.7 >>> >>> Given another query vector >>> >>> >>> qr <- c("AAC", "ATT") >>>> >>>> >>> I would like to find the corresponding value for each query above, >>> yielding: >>> >>> 0.02 >>> 0.7 >>> >>> However, I want to avoid slurping whole repo.txt into an object (e.g. >>> hash). >>> Is there any ways to do that? >>> >>> The reason I want to do that because repo.txt is very2 large size >>> (milions of lines, >>> with tag length > 30 bp), and my PC memory is too small to keep it. >>> >>> >>> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >> > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40" S 49° 16' 22" O [[alternative HTML version deleted]]
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.