Hi,
Currently we are in the process of figuring out how to deal with
millions of CSV files containing weather data(20+ million files). Each
file is about 500 bytes in size.
We want to calculate statistics on fields read from the file. For
example, the standard deviation of wind speed across all 20+ million files.
Processing speed isn't an important issue. The analysis routine can run
for days, if needed.

The StatsComponent(http://wiki.apache.org/solr/StatsComponent) for Solr
appears to be able to calculate the statistics we are interested in.

Will the StatsComponent in Solr do what we need with minimal configuration?
Can the StatsComponent only be used on a subset of the data? For
example, only look at data from certain months?
Are there other free programs out there that can parse and analyze 20+
million files?

We are still very new to Solr and really appreciate all your help.
Thanks,
Fred

Reply via email to