Re: [R] Manage huge database

2008-09-22 Thread José E. Lozano
Hello: I've been reading all the replies, and I think I have some good ideas to work on. Right now the code I programmed is running, It has been running in a batch process 20h now, and It has imported 1750 rows out of 2000. I will read docs for the bioconductor package, and I will check the gawk

Re: [R] Manage huge database

2008-09-22 Thread Thomas Lumley
On Mon, 22 Sep 2008, Martin Morgan wrote: "José E. Lozano" <[EMAIL PROTECTED]> writes: Maybe you've not lurked on R-help for long enough :) Apologies! Probably. So, how much "design" is in this data? If none, and what you've basically got is a 2000x50 grid of numbers, then maybe a more

Re: [R] Manage huge database

2008-09-22 Thread Gabor Grothendieck
Try this: read.table(pipe("/Rtools/bin/gawk -f cut.awk bigdata.dat")) where cut.awk contains the single line (assuming you want fields 101 through 110 and none other): { for(i = 101; i <= 110; i++) printf("%s ", $i); printf "\n" } or just use cut. I tried the gawk command above on Windows Vist

Re: [R] Manage huge database

2008-09-22 Thread Ted Harding
On 22-Sep-08 11:00:30, José E. Lozano wrote: >> So is each line just ACCGTATAT etc etc? > > Exacty, A_G, A_A, G_G and the such. > >> If you have fixed width fields in a file, so that every line is the >> same length, then you can use random access methods to get to a >> particular value - just mu

Re: [R] Manage huge database

2008-09-22 Thread Barry Rowlingson
2008/9/22 jim holtman <[EMAIL PROTECTED]>: > Why don't you make one pass through your data and encode you > characters as integers (it would appear that you only have 16 > combinations). You might also want to consider using the 'raw' object > since these only take up one byte of storage -- will r

Re: [R] Manage huge database

2008-09-22 Thread jim holtman
Why don't you make one pass through your data and encode you characters as integers (it would appear that you only have 16 combinations). You might also want to consider using the 'raw' object since these only take up one byte of storage -- will reduce your storage requirements by 4. Then store e

Re: [R] Manage huge database

2008-09-22 Thread José E. Lozano
> So is each line just ACCGTATAT etc etc? Exacty, A_G, A_A, G_G and the such. > If you have fixed width fields in a file, so that every line is the > same length, then you can use random access methods to get to a > particular value - just multiply the line length by the row number you Nice hint

Re: [R] Manage huge database

2008-09-22 Thread José E. Lozano
> What are you going to do with the data once you have read it in? Are > all the data items numeric? If they are numeric, you would need at > least 8GB to hold one copy and probably a machine with 32GB if you > wanted to do any manipulation on the data. Well, I will use only sets of variables to

Re: [R] Manage huge database

2008-09-22 Thread Martin Morgan
"José E. Lozano" <[EMAIL PROTECTED]> writes: >> Maybe you've not lurked on R-help for long enough :) Apologies! > > Probably. > >> So, how much "design" is in this data? If none, and what you've >> basically got is a 2000x50 grid of numbers, then maybe a more raw > > Exactly, raw data, but a l

Re: [R] Manage huge database

2008-09-22 Thread Barry Rowlingson
2008/9/22 José E. Lozano <[EMAIL PROTECTED]>: > Exactly, raw data, but a little more complex since all the 50 variables > are in text format, so the width is around 2,500,000. > Thanks, I will check. Right now I am reading line by line the file. It's > time consuming, but since I will do it

Re: [R] Manage huge database

2008-09-22 Thread jim holtman
What are you going to do with the data once you have read it in? Are all the data items numeric? If they are numeric, you would need at least 8GB to hold one copy and probably a machine with 32GB if you wanted to do any manipulation on the data. You can use a 'connection' and 'scan' to read the

Re: [R] Manage huge database

2008-09-22 Thread José E. Lozano
> Maybe you've not lurked on R-help for long enough :) Apologies! Probably. > So, how much "design" is in this data? If none, and what you've > basically got is a 2000x50 grid of numbers, then maybe a more raw Exactly, raw data, but a little more complex since all the 50 variables are i

Re: [R] Manage huge database

2008-09-22 Thread Barry Rowlingson
2008/9/22 José E. Lozano <[EMAIL PROTECTED]>: >> I wouldn't call a 4GB csv text file a 'database'. > It didn't help, sorry. I perfectly knew what a relational database is (and I > humbly consider myself an advanced user on working with MSAccess+VBA, only > that I've never face this problem with v

Re: [R] Manage huge database

2008-09-22 Thread José E. Lozano
> I wouldn't call a 4GB csv text file a 'database'. Obviously, a csv it's not a database itself, I tried to mean (though it seems I was not understood) that I had a huge database, exported to csv file by the people who created it (and I don’t have any idea of the original format of the database).

Re: [R] Manage huge database

2008-09-22 Thread José E. Lozano
Hello, Yihui > You can treat it as a database and use ODBC to fetch data from the CSV > file using SQL. See the package RODBC for details about database > connections. (I have dealt with similar problems before with RODBC) Thanks for your tip, I have used RODBC before to read data from MSAccess a

Re: [R] Manage huge database

2008-09-22 Thread Yihui Xie
Hi, You can treat it as a database and use ODBC to fetch data from the CSV file using SQL. See the package RODBC for details about database connections. (I have dealt with similar problems before with RODBC) Regards, Yihui -- Yihui Xie <[EMAIL PROTECTED]> Phone: +86-(0)10-82509086 Fax: +86-(0)10-

Re: [R] Manage huge database

2008-09-22 Thread Barry Rowlingson
2008/9/22 José E. Lozano <[EMAIL PROTECTED]>: > Recently I have been trying to open a huge database with no success. > > It's a 4GB csv plain text file with around 2000 rows and over 500,000 > columns/variables. I wouldn't call a 4GB csv text file a 'database'. > Is there any way to work with "

[R] Manage huge database

2008-09-21 Thread José E. Lozano
Hello, Recently I have been trying to open a huge database with no success. It’s a 4GB csv plain text file with around 2000 rows and over 500,000 columns/variables. I have try with The SAS System, but it reads only around 5000 columns, no more. R hangs up when opening. Is there any