Hi Jim, Thank you very much! I will try to sample them then.
Best, Dongyan jholtman wrote: > > The amount of data that you want to read in (136M numbers) will > require about 1GB of memory (8 bytes per number for floating point - > truncation does not reduce this number of bytes). So if you want to > read it all in, then find a 64-bit version of R and probably at least > 4GB of memory for your process. A 32-bit version might have just > enough space if you can allocate all the 4GB of memory to that > process. > > So if you want to have it all in memory, invest in a larger computer. > If you want to run on the system you have, then you will probably have > to sample your data so that you can get a portion that will fit in > memory to run your test, or see if there is a way of processing > portions of the file and then combining for a final result. > On Wed, Mar 18, 2009 at 9:58 AM, Dongyan Song <yzhsk...@hotmail.com> > wrote: >> >> Hi, >> >> Thank you for your concern! >> >> The file has 136,047,472 lines, with one value in each line, and is 1.7G >> in >> size. I run in a Linux (OpenSuse OS) with 4G memory in total. The error >> message is Error: cannot allocate vector of size 2.0 Gb. And the worst >> thing >> is even if I read all the data into R after I truncate the numbers' >> precision, i.e. from 1.234567e+00 to 1.2, I cannot manipulate these >> numbers, >> for example, I cannot do ks.test, histogram, kernel density estimator, >> which >> I want to do with these numbers. And after I input commands above, >> computer >> also give error messages like Error: cannot allocate vector of size 809.1 >> Mb. I can read a half of file, but I want to know the overall >> distribution >> of those numbers, and values in this file is not ordered, and it is not >> quite easy to random pick up some numbers or sort them. >> >> Is these information enough? Thank you again! >> >> Best, >> Dongyan >> >> >> >> jholtman wrote: >>> >>> readLines is doing exactly what you are asking: >>> >>> Value >>> A character vector of length the number of lines read. >>> >>> You still have to convert the character strings to numeric. Exactly >>> how large is "quite large"? What system are you running on? How much >>> memory do you have? What is the error message that you are getting? >>> Exactly what does your file look like? Have you tried reading in >>> portions of the file? How big will it be if you could read it in? >>> Will it take up more than 25% of real memory? There is still some >>> information you need to provide so an assessment can be made. >>> >>> On Tue, Mar 17, 2009 at 8:50 AM, Dongyan Song <yzhsk...@hotmail.com> >>> wrote: >>>> >>>> Dear all, >>>> >>>> I read a file with all numbers with readLines function, as below, >>>>> f <- file("data.txt") >>>>> a <- readLines(f) >>>> but all the values in a are in format "....", and I cannot do the >>>> calculation with them since they are not numeric. I wonder how should I >>>> skip >>>> those quotes, thank you for help! >>>> I have to use readLines function instead of scan, read.table or matrix, >>>> because the size of file is quite large, and other function cannot >>>> allocate >>>> enough space/memory to read the input file. >>>> >>>> Best, >>>> Dongyan >>>> -- >>>> View this message in context: >>>> http://www.nabble.com/the-quote-problem-with-readLines%28%29-tp22558454p22558454.html >>>> Sent from the R help mailing list archive at Nabble.com. >>>> >>>> ______________________________________________ >>>> R-help@r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>>> >>> >>> >>> >>> -- >>> Jim Holtman >>> Cincinnati, OH >>> +1 513 646 9390 >>> >>> What is the problem that you are trying to solve? >>> >>> ______________________________________________ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >>> >> >> >> ----- >> Dongyan Song, Msc >> Medical informatics, Uppsala University, Sweden >> -- >> View this message in context: >> http://www.nabble.com/the-quote-problem-with-readLines%28%29-tp22558454p22579163.html >> Sent from the R help mailing list archive at Nabble.com. >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > > > -- > Jim Holtman > Cincinnati, OH > +1 513 646 9390 > > What is the problem that you are trying to solve? > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > ----- Dongyan Song, Msc Medical informatics, Uppsala University, Sweden -- View this message in context: http://www.nabble.com/the-quote-problem-with-readLines%28%29-tp22558454p22581029.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.