Yes, I am showing the first 5 columns as an example. Thank you very much for your suggestion. Let me check it out.
Allen On Nov 10, 2007 12:39 AM, jim holtman <[EMAIL PROTECTED]> wrote: > Your data is mixed: numeric and characters/factors. You can use > skip=1 to skip the header line, but it looks like the rest is mixed. > In you example there are only 5 columns; are you just showing the > first 5 columns? if there is the pattern that you show, then you > would have a scan like: > > scan('yourfile', what=list('', 0, '', 0, '')) > > You can extend the 'what' to the size of the column that you have; e.g. > > what=c(rep(c(list(''), list(0)), rep=243), list('')) > > > > > On Nov 10, 2007 12:29 AM, affy snp <[EMAIL PROTECTED]> wrote: > > Hi Jim, > > > > I tired scan() first and got > > > > > x <- scan(file="243_47mel_withnormal_expression_log2.txt", what=0) > > Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, > > : > > scan() expected 'a real', got 'probe_set' > > > > So I guess it requires the file be numeric. But I do have row names > > and header. > > > > The real file looks like (I am listing the header and first 4 rows of the > > file): > > > > probe_set WM_806_Signal_A WM_806_call WM_1716_Signal_A WM_1716_call > > SNP_A-1909444 1.59 B 1.48 B > > SNP_A-2237149 2.24 B 1.87 B > > SNP_A-2118217 2.04 AB 1.70 AB > > SNP_A-1866065 1.80 NoCall 1.39 A > > > > So how can I get rid of the header and row.names to use scan()? > > > > Thanks! > > > > Allen > > > > > > > > > > On Nov 10, 2007 12:18 AM, jim holtman <[EMAIL PROTECTED]> wrote: > > > Here is an example of reading in file of 3M numbers (11MB of text > > > file) on my laptop: > > > > > > > system.time(x <- scan('/tempyy', what=0)) > > > Read 3000000 items > > > user system elapsed > > > 6.22 0.16 6.53 > > > > str(x) > > > num [1:3000000] 1 2 3 4 5 6 7 8 9 10 ... > > > > gc() > > > used (Mb) gc trigger (Mb) max used (Mb) > > > Ncells 169954 4.6 350000 9.4 350000 9.4 > > > Vcells 3102277 23.7 7803840 59.6 7200206 55.0 > > > > object.size(x) > > > [1] 24000024 > > > > > > This took about 7 seconds. You have about 40X more data, so it should > > > be interesting to see how it scales up. The object size if 24MB, so > > > 40X more is about 1GB. > > > > > > > > > On Nov 9, 2007 11:52 PM, affy snp <[EMAIL PROTECTED]> wrote: > > > > Hi Jim, > > > > > > > > Thanks a lot! I am currently running it on my laptop but without any > > > > success. I could upload it to a server which is with 8Gb memory > > > > and it might be better to go from there. > > > > > > > > Actually, I could have the whole file splitted in two parts, > > > > one with 2nd column to 95th column, the other one with > > > > the rest of columns. However, I need all rows for the > > > > two parts. > > > > > > > > The file is in txt format and around 480Mb, very large though. > > > > Yes, it is of numeric values. > > > > > > > > I appreciate! > > > > > > > > Allen > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Nov 9, 2007 11:46 PM, jim holtman <[EMAIL PROTECTED]> wrote: > > > > > If they are all numeric, you can use 'scan' to read them in. With > > > > > that amount of data, you will need almost 1GB to contain the single > > > > > object. If you want to do any processing, you will probably need a > > > > > machine with at least 3-4GB of physical memory, preferrably a 64-bit > > > > > version of R. What type of computer are you using? Do you really > > > > > need all the data in at once, or can you process it in smaller batches > > > > > (e.g., 20,000 rows at a time)? So a little more detail on what you > > > > > actually want to do with the data would be useful, since it does > > > > > create a very large object. BTW how large is the file you are reading > > > > > and what is its format? Have you considered a database with this > > > > > amount of data? > > > > > > > > > > > > > > > On Nov 9, 2007 11:39 PM, affy snp <[EMAIL PROTECTED]> wrote: > > > > > > Dear list, > > > > > > > > > > > > I need to read in a big table with 487 columns and 238,305 rows > > > > > > (row names > > > > > > and column names are supplied). Is there a code to read in the > > > > > > table in > > > > > > a fast way? I tried the read.table() but it seems that it takes > > > > > > forever :( > > > > > > > > > > > > Thanks a lot! > > > > > > > > > > > > Best, > > > > > > Allen > > > > > > > > > > > > ______________________________________________ > > > > > > R-help@r-project.org mailing list > > > > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > > > > PLEASE do read the posting guide > > > > > > http://www.R-project.org/posting-guide.html > > > > > > and provide commented, minimal, self-contained, reproducible code. > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > Jim Holtman > > > > > Cincinnati, OH > > > > > +1 513 646 9390 > > > > > > > > > > What is the problem you are trying to solve? > > > > > > > > > > > > > > > > > > > > > -- > > > > > > Jim Holtman > > > Cincinnati, OH > > > +1 513 646 9390 > > > > > > What is the problem you are trying to solve? > > > > > > > > > -- > > Jim Holtman > Cincinnati, OH > +1 513 646 9390 > > What is the problem you are trying to solve? > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.