[Rd] ALARM!!!! Re: [R] regarding large csv file import

2006-10-27 Thread gyadav

hi Jim,

if i partition the file, then for further operation like merging the 
partitioned files and after that doing some analysis on whole data set 
would again require the same amount of memory. If i am not able to do or 
if i am not having memory then i feel there should be serious thinking 
over the issue of memory handling.
hence i am also copying this to r-devel list and i would also would like 
to contribute and write code for memory handling issue. i would like to 
address this request to the great coders of R that software should be able 
to run in any amount of memory (except some minimum threshold...bingo). 
thus i would invite all the great coders to please address this issue and 
if in any ways i can be helpfull then i am right here.

thanks
with regards
-gaurav







"jim holtman" <[EMAIL PROTECTED]> 
27-10-06 09:09 PM

To
"[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
cc

Subject
Re: [R] regarding large csv file import






Is the file only numeric, or does it also contain characters?  You will 
get better performance by either using 'scan' , or specifying what the 
type of each column is with 'colClasses' so that read.csv does not have to 
guess at the types. 
 
You will probably need more memory depending on the type of data.  If I 
assume that it is numeric and that it takes about 6 characters to specify 
a number, then you have approximately 45M numbers in the file and this 
will take up 362MB for a single object.  You should have at least 3X the 
size of the largest object to do any processing since copies will have to 
be made. 
 
I would suggest partitioning the file and processing in parts.  You can 
also put it in a database and 'sample' the rows that you want to process.

 
On 10/27/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: 

hi All,

i have a .csv of size 272 MB and a RAM of 512MB and working on windows XP.
I am not able to import the csv file. 
R hangs means it stops responding even SciViews hangs.
i am using read.csv(FILENAME,sep=",",header=TRUE). Is there any way to
import it.
i have tried archives already but i was not able to sense much. 

thanks in advance

  Sayonara With Smile & With Warm Regards :-)

G a u r a v   Y a d a v
Assistant Manager,
Economic Research & Surveillance Department,
Clearing Corporation Of India Limited. 

Address: 5th, 6th, 7th Floor, Trade Wing 'C',  Kamala City, S.B. Marg,
Mumbai - 400 013
Telephone(Office): - +91 022 6663 9398 ,  Mobile(Personal) (0)9821286118
Email(Office) :- [EMAIL PROTECTED] ,  Email(Personal) :-
[EMAIL PROTECTED]



 

DISCLAIMER AND CONFIDENTIALITY CAUTION:\ \ This message and ...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html 
and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve? 



DISCLAIMER AND CONFIDENTIALITY CAUTION:\ \ This message and ...{{dropped}}

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] ALARM!!!! Re: [R] regarding large csv file import

2006-10-28 Thread gyadav

hi All,

ok fine got it. The design of R is such that it will work only if data 
fits in the main memory. This issue has been taken up many times but it 
seems it would be very difficult to change the core code hmmm. ok fine. 
hence if i want to work then i will have to either partition the data or 
work with columns. 

thanks to you all
-   Sayonara With Smile & With Warm Regards :-)

  G a u r a v   Y a d a v
  Assistant Manager,
  Economic Research & Surveillance Department,
  Clearing Corporation Of India Limited.

  Address: 5th, 6th, 7th Floor, Trade Wing 'C',  Kamala City, S.B. Marg, 
Mumbai - 400 013
  Telephone(Office): - +91 022 6663 9398 ,  Mobile(Personal) (0)9821286118
  Email(Office) :- [EMAIL PROTECTED] ,  Email(Personal) :- 
[EMAIL PROTECTED]




"jim holtman" <[EMAIL PROTECTED]> 
28-10-06 09:17 AM

To
"[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
cc

Subject
Re: ALARM Re: [R] regarding large csv file import






I think that the real issue is that if you want speed and the ability to 
use all the functions in R, then you have to have the data fit in memory 
because of the random access that is done to the data.  Now you can create 
some specialized functions that could make passes through the data and 
accumulate a basis set of stats. 
 
You don't want to have to use virtual memory because the paging would 
significantly slow down any processing that you would want to do.  That is 
the reason why many people store their data in a database and then use the 
SELECT command to pull out the subset of the data that they want to 
operate on.  The majority of the data object that are processed can fit in 
memory and R is designed for everything in memory.  This has been the case 
since it was developed and one of the things you have to live with. 
 
I don't think that there is anyway that a new memory allocation algorithm 
would fit this since it would mean a rewrite of all the code to handle 
data in this fashion.  Simple things like selecting a subset of the data 
using conditional tests would not be simple. 
 
If you can not partition your data and analyze it in subsets that will fit 
in memory, then probably R is not the system you should be using.

 
On 10/27/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: 

hi Jim, 

if i partition the file, then for further operation like merging the 
partitioned files and after that doing some analysis on whole data set 
would again require the same amount of memory. If i am not able to do or 
if i am not having memory then i feel there should be serious thinking 
over the issue of memory handling. 
hence i am also copying this to r-devel list and i would also would like 
to contribute and write code for memory handling issue. i would like to 
address this request to the great coders of R that software should be able 
to run in any amount of memory (except some minimum threshold...bingo). 
thus i would invite all the great coders to please address this issue and 
if in any ways i can be helpfull then i am right here. 

thanks 
with regards 
-gaurav 






"jim holtman" <[EMAIL PROTECTED]> 
27-10-06 09:09 PM 


To
"[EMAIL PROTECTED]" < [EMAIL PROTECTED]> 
cc

Subject
Re: [R] regarding large csv file import









Is the file only numeric, or does it also contain characters?  You will 
get better performance by either using 'scan' , or specifying what the 
type of each column is with 'colClasses' so that read.csv does not have to 
guess at the types. 
 
You will probably need more memory depending on the type of data.  If I 
assume that it is numeric and that it takes about 6 characters to specify 
a number, then you have approximately 45M numbers in the file and this 
will take up 362MB for a single object.  You should have at least 3X the 
size of the largest object to do any processing since copies will have to 
be made. 
 
I would suggest partitioning the file and processing in parts.  You can 
also put it in a database and 'sample' the rows that you want to process.


On 10/27/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: 

hi All,

i have a .csv of size 272 MB and a RAM of 512MB and working on windows XP.
I am not able to import the csv file. 
R hangs means it stops responding even SciViews hangs.
i am using read.csv(FILENAME,sep=",",header=TRUE). Is there any way to
import it.
i have tried archives already but i was not able to sense much. 

thanks in advance

 Sayonara With Smile & With Warm Regards :-) 

G a u r a v   Y a d a v
Assistant Manager,
Economic Research & Surveillance Department,
Clearing Corporation Of India Limited. 

Address: 5th, 6th, 7th Floor, Trade Wing 'C',  Kamala City, S.B. Marg, 
Mumbai - 400 013
Telephone(Office): - +91 022 6663 9398 ,  Mobile(Personal) (0)9821286118
Email(Office) :- [EMAIL PROTECTED] ,  Email(Personal) :-
[EMAIL PROTECTED]



 

DISCLAIMER AND CONFIDENTIALITY CAUTION:\ \ This message and ...{{dropped}} 


__