Ok. Thank you. As of now, vectorization option is feasible. Was not sure to handle this way. would try.
Regards, Suresh Philipp Pagel-5 wrote: > >> For certain calculations, I have to handle a dataframe with say 10 >> million >> rows and multiple columns of different datatypes. >> When I try to perform calculations on certain elements in each row, the >> program just goes in "busy" mode for really long time. >> To avoid this "busy" mode, I split the dataframe into subsets of 10000 >> rows. >> Then the calculation was done very fast. within reasonable time. >> >> Is there any other tip to improve the performance ? > > Depending on what exactly it is you are doing and what causes the slowdown > there may be a number of useful strategies: > > - Buy RAM (lots of it) - it's cheap > - Vectorize whatever you are doing > - Don't use all the data you have but draw a random sample of reasonalbe > size > - ... > > To be more helpful we'd have to know > > - what are the computations involved? > - how are they implemented at the moment? > -> example code > - what is the range of "really long time"? > > cu > Philipp > > -- > Dr. Philipp Pagel > Lehrstuhl für Genomorientierte Bioinformatik > Technische Universität München > Wissenschaftszentrum Weihenstephan > 85350 Freising, Germany > http://mips.gsf.de/staff/pagel > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- View this message in context: http://www.nabble.com/Tip-for-performance-improvement-while-handling-huge-data--tp21901287p21902758.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.