Hi Guys, Here I'm struggling with to decide whether Solr would be a fitting solution for me. Highly appreciate you
The key requirements can be summarized as below - 1. Need to process very high volume of data online from log files of various applications - around 100s of Millions of total size may be varying within a range of 30-40 GB. 2. Flexibility - Log file formats from different applications would be different. Also for the same application log file formats can vary. However, the log files would be in xml and if a new type has to be supported then the schema for the same would be known before hand. 3. The type of queries to be supported - a) Mostly aggregation type statistics (min, max, average, sd, count etc.) of response times, sales numbers etc. b) Ability to support adhoc queries relating multiple fields in a given logfile, joining similar fields in multiple logfiles 4. Flexibility - Log file formats from different applications would be different. Also for the same application log file formats can vary. However, the log files would be in xml and if a new type has to be supported then the schema for the same would be known before hand. 5. Expected performance would be around 10 to 20 sec for majority of the queries. For rest it may be a bit more higher. I'm planning to use Solr with multicore and distributed search feature. However also considering Hadoop with Hbase as that looks to be a natural solution to support multiple file formats and handling adhoc queries. I would surely like to have your viewpoints on this regard - whether given the key requirements above Solr is a right choice or Hadoop+HBase would be better (or any other open source product). Thanks in advance. Regards, Sourav **************** CAUTION - Disclaimer ***************** This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely for the use of the addressee(s). If you are not the intended recipient, please notify the sender by e-mail and delete the original message. Further, you are not to copy, disclose, or distribute this e-mail or its contents to any other person and any such actions are unlawful. This e-mail may contain viruses. Infosys has taken every reasonable precaution to minimize this risk, but is not liable for any damage you may sustain as a result of any virus in this e-mail. You should carry out your own virus checks before opening the e-mail or attachment. Infosys reserves the right to monitor and review the content of all messages sent to or from this e-mail address. Messages sent to or from this e-mail address may be stored on the Infosys e-mail system. ***INFOSYS******** End of Disclaimer ********INFOSYS***