Hi all, Thank for the advice! Gabor, I've been putting off getting into SQLite. I may need to bite the bullet and learn it.
Jim - thanks for the help - and yes, I'd read that old post. My problem is that, with the other objects already in memory, I cannot pull the whole matrix in (in reality, it has 3200 rows - the 100 was just my example). So it looks like for now, I'll be looping... SQL down the road. QUESTION: is there any way that the Gods of R would consider allowing the "skip" argument in scan() to deal with vectors? Maybe it's not so easy, but if so... Thanks all! Matt On 11/8/07, Gabor Grothendieck <[EMAIL PROTECTED]> wrote: > Don't know if SQLite can handle that many columns but if it can and if file > in an acceptable format then sqldf simplifies the interface to reading it > into an SQLite database that it automatically creates on the fly and then > gets a subset out of it into R. (If it will fit into memory you can omit the > dname= argument.) > > library(sqldf) > source("http://sqldf.googlecode.com/svn/trunk/R/sqldf.R") > > myfile <- file("myfile.dat") > sqldf("select * from myfile where rowid % 2 = 0 and rowid >= 5", > dbname = tempfile()) > > See example 6 on the home page: > http://sqldf.googlecode.com > > > On Nov 8, 2007 4:19 AM, Matthew Keller <[EMAIL PROTECTED]> wrote: > > Hi all, > > > > Is there a way to skip non-sequential lines using the "skip" argument > > in the scan function? > > > > E.g., I have a matrix with 100 rows and 1e7 columns. I open a > > connection and want to read only lines 5, 7, 9, etc [i.e., > > seq(5,99,2)] > > > > It might seem that the syntax to do this would be something like this > > (if only the "skip" allowed vectors in the same way colClasses does in > > read.table): > > > > con <- file("bigfile",open="r") > > rows.I.want <- seq(5,99,2) > > new <- scan(con,what="character",skip=rows.I.want-1,nlines=rows.I.want) > > > > The above doesn't work - it would read lines 5, 6, 7, ... > > length(seq(5,99,2)) rather than 5, 7, 9, ... 99. Yes, I know I can > > accomplish this by looping, but with the huge datasets I'll be working > > with, I'd like to try to save time by doing it all at once. Any ideas? > > > > Matt > > > > > > > > -- > > Matthew C Keller > > Asst. Professor of Psychology > > University of Colorado at Boulder > > www.matthewckeller.com > > > > ______________________________________________ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > -- Matthew C Keller Asst. Professor of Psychology University of Colorado at Boulder www.matthewckeller.com ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.