Hello,

Your bug is obvious, each pass through the loop you read twice and write only once. The file pointer keeps moving forward...
Use something like

while (length(pv <- readLines(con, n=n)) > 0 ) { # note that this line changed.
    i <- i + 1
write.table(pv, file = paste(fileNames.temp.1, "_", i, ".txt", sep = ""), sep = "\t")
}

(or put the line with read.table where you have readLines.)

Anyway, I don't like it very much. If you know the number of lines in the input file, it would be much better to use integer division and modulus to determine how many times and how much to read.
Something like

n <- 1000000

passes <- number.of.lines.in.file %/% n
remaining <- number.of.lines.in.file %% n

for(i in seq.int(passes)){

    [ ... read n lines at a time & process them...]

}
if(remaining){
    n <- remaining

    [ ...read what's left... ]
}


If you do not know how many lines are there in the file, see (package::function)

parser::nlines
R.utils::countLines

Hope this helps,

Rui Barradas


Em 16-05-2012 11:00, r-help-requ...@r-project.org escreveu:
Date: Tue, 15 May 2012 22:16:42 +0200
From: gianni lavaredo<gianni.lavar...@gmail.com>
To:r-help@r-project.org
Subject: [R] Problem to resolve a step for reading a large TXT and
        split in several file
Message-ID:
        <caj6jbr-ywgjsfu8o0unvet6m8p8wvp7ybosxw5nrdz48wod...@mail.gmail.com>
Content-Type: text/plain

Dear Researchs,

It's the first time I am trying to resolve this problem. I have a TXT file
with 1408452 rows. I wish to split file-by-file where each file has
1,000,000 rows with the following procedure:

# split in two file one with 1,000,000 of rows and one with 408,452 of rows

file<- "09G001_72975_7575_25_4025.txt"
fileNames<- strsplit(as.character(file), ".", fixed = TRUE)
fileNames.temp.1<- unique(as.vector(do.call("rbind", fileNames)[, 1]))

con<- file(file, open = "r")
# n is the number of row
n<- 1000000
i<- 0
while (length(readLines(con, n=n))>  0 ) {
     i<- i + 1
     pv<- read.table(con,header=F,sep="\t", nrow=n)
     write.table(pv, file = paste(fileNames.temp.1,"_",i,".txt",sep = ""),
sep = "\t")
}
close(con)


when I use 1,000,000 I have in the directory only
"09G001_72975_7575_25_4025_1.txt" (with 1000000 of rows) and not
"09G001_72975_7575_25_4025_2.txt"  (with 408,452). I din't understand where
is my bug

Furthermore when i wish for example split in 3 files (where n is 469484 =
1408452/3) i have this message:

*Error in read.table(con, header = F, sep = "\t", nrow = n) :
   no lines available in input*

Thanks for all help and sorry for the disturb

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to