>From what I can understand, you have little full-text search involved here. You should probably look at Hadoop and its contrib and sub-projects such as Pig, Hive and Chukwa.
http://wiki.apache.org/hadoop/ http://wiki.apache.org/hadoop/Hive http://wiki.apache.org/hadoop/Chukwa http://incubator.apache.org/pig/ On Fri, Nov 7, 2008 at 9:03 PM, souravm <[EMAIL PROTECTED]> wrote: > Hi Guys, > > Here I'm struggling with to decide whether Solr would be a fitting solution > for me. Highly appreciate you > > The key requirements can be summarized as below - > > 1. Need to process very high volume of data online from log files of > various applications - around 100s of Millions of total size may be varying > within a range of 30-40 GB. > > 2. Flexibility - Log file formats from different applications would be > different. Also for the same application log file formats can vary. However, > the log files would be in xml and if a new type has to be supported then the > schema for the same would be known before hand. > > 3. The type of queries to be supported - > a) Mostly aggregation type statistics (min, max, average, sd, count etc.) > of response times, sales numbers etc. > b) Ability to support adhoc queries relating multiple fields in a given > logfile, joining similar fields in multiple logfiles > > 4. Flexibility - Log file formats from different applications would be > different. Also for the same application log file formats can vary. However, > the log files would be in xml and if a new type has to be supported then the > schema for the same would be known before hand. > > 5. Expected performance would be around 10 to 20 sec for majority of the > queries. For rest it may be a bit more higher. > > I'm planning to use Solr with multicore and distributed search feature. > However also considering Hadoop with Hbase as that looks to be a natural > solution to support multiple file formats and handling adhoc queries. > > I would surely like to have your viewpoints on this regard - whether given > the key requirements above Solr is a right choice or Hadoop+HBase would be > better (or any other open source product). > > Thanks in advance. > > Regards, > Sourav > > **************** CAUTION - Disclaimer ***************** > This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended > solely > for the use of the addressee(s). If you are not the intended recipient, > please > notify the sender by e-mail and delete the original message. Further, you > are not > to copy, disclose, or distribute this e-mail or its contents to any other > person and > any such actions are unlawful. This e-mail may contain viruses. Infosys has > taken > every reasonable precaution to minimize this risk, but is not liable for > any damage > you may sustain as a result of any virus in this e-mail. You should carry > out your > own virus checks before opening the e-mail or attachment. Infosys reserves > the > right to monitor and review the content of all messages sent to or from > this e-mail > address. Messages sent to or from this e-mail address may be stored on the > Infosys e-mail system. > ***INFOSYS******** End of Disclaimer ********INFOSYS*** > -- Regards, Shalin Shekhar Mangar.