2011/11/8 Sergio René Araujo Enciso <araujo.enc...@gmail.com>: > Dear all: > > I have two larges files with 2000 columns. For each file I am > performing a loop to extract the "i"th element of each file and create > a data frame with both "i"th elements in order to perform further > analysis. I am not extracting all the "i"th elements but only certain > which I am indicating on a vector called "d". > > See an example of my code below > > ### generate an example for the CSV files, the original files contain > more than 2000 columns, here for the sake of simplicity they have only > 10 columns > M1<-matrix(rnorm(1000), nrow=100, ncol=10, > dimnames=list(seq(1:100),letters[1:10])) > M2<-matrix(rnorm(1000), nrow=100, ncol=10, > dimnames=list(seq(1:100),letters[1:10])) > write.table(M1, file="M1.csv", sep=",") > write.table(M2, file="M2.csv", sep=",") > > ### the vector containing the "i" elements to be read > d<-c(1,4,7,8) > P1<-read.table("M1.csv", header=TRUE) > P2<-read.table("M1.csv", header=TRUE) > for (i in d) { > M<-data.frame(P1[i],P2[i]) > rm(list=setdiff(ls(),"d")) > } > > As the files are quite large, I want to include "read.table" within > the loop so as it only read the "i"th element. I know that there is > the option "colClasses" for which I have to create a vector with zeros > for all the columns I do not want to load. Nonetheless I have no idea > how to make this vector to change in the loop, so as the only element > with no zeros is the "i"th element following the vector "d". Any ideas > how to do this? Or is there anz other approach to load only an > specific element? >
Its a bit messy if there are row names so lets generate M1.csv like this: write.csv(M1, file = "M1.csv", row.names = FALSE) Then we can do this: nc <- ncol(read.csv("M1.csv", nrows = 1)) colClasses <- replace(rep("NULL", nc), d, NA) M1.subset <- read.csv("M1.csv", colClasses = colClasses) or using the same M1.csv that we just generated try this which uses sqldf with the H2 backend: library(sqldf) library(RH2) M0 <- read.csv("M1.csv", nrows = 1)[0L, ] M1.subset.h2 <- sqldf(c("insert into M0 (select * from csvread('M1.csv'))", "select a, d, g, h from M0")) This is referred to as Alternative 3 in FAQ#10 Example 6a on the sqldf home page: http://sqldf.googlecode.com Alternative 1 and Alternative 2 listed there could also be tried. (Note that although sqldf has a read.csv.sql command we did not use it here since that command only works with the sqlite back end and the RSQLite driver has a max of 999 columns.) -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.