Hi R list, I'm new to R software, so I'd like to ask about it is capabilities. What I'm looking to do is to run some statistical tests on quite big tables which are aggregated quotes from a market feed.
This is a typical set of data. Each day contains millions of records (up to 10 non filtered). 2011-05-24 750 Bid DELL 14130770 400 15.4800 BATS 35482391 Y 1 1 0 0 2011-05-24 904 Bid DELL 14130772 300 15.4800 BATS 35482391 Y 1 0 0 0 2011-05-24 904 Bid DELL 14130773 135 15.4800 BATS 35482391 Y 1 0 0 0 I'll need to filter it out first based on some criteria. Since I keep it mysql database, it can be done through by query. Not super efficient, checked it already. Then I need to aggregate dataset into different time frames (time is represented in ms from midnight, like 35482391). Again, can be done through a databases query, not sure what gonna be faster. Aggregated tables going to be much smaller, like thousands rows per observation day. Then calculate basic statistic: mean, standard deviation, sums etc. After stats are calculated, I need to perform some statistical hypothesis tests. So, my question is: what tool faster for data aggregation and filtration on big datasets: mysql or R? Thanks, --Roman N. [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.