Hi Guys,

Here I'm struggling with to decide whether Solr would be a fitting solution for 
me. Highly appreciate you

The key requirements can be summarized as below -

1. Need to process very high volume of data online from log files of various 
applications - around 100s of Millions of total size may be varying within a 
range of 30-40 GB.

2. Flexibility - Log file formats from different applications would be 
different. Also for the same application log file formats can vary. However, 
the log files would be in xml and if a new type has to be supported then the 
schema for the same would be known before hand.

3. The type of queries to be supported -
a) Mostly aggregation type statistics (min, max, average, sd, count etc.) of 
response times, sales numbers etc.
b) Ability to support adhoc queries relating multiple fields in a given 
logfile, joining similar fields in multiple logfiles

4. Flexibility - Log file formats from different applications would be 
different. Also for the same application log file formats can vary. However, 
the log files would be in xml and if a new type has to be supported then the 
schema for the same would be known before hand.

5. Expected performance would be around 10 to 20 sec for majority of the 
queries. For rest it may be a bit more higher.

I'm planning to use Solr with multicore and distributed search feature. However 
also considering Hadoop with Hbase as that looks to be a natural solution to 
support multiple file formats and handling adhoc queries.

I would surely like to have your viewpoints on this regard - whether given the 
key requirements above Solr is a right choice or Hadoop+HBase would be better 
(or any other open source product).

Thanks in advance.

Regards,
Sourav

**************** CAUTION - Disclaimer *****************
This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely 
for the use of the addressee(s). If you are not the intended recipient, please 
notify the sender by e-mail and delete the original message. Further, you are 
not 
to copy, disclose, or distribute this e-mail or its contents to any other 
person and 
any such actions are unlawful. This e-mail may contain viruses. Infosys has 
taken 
every reasonable precaution to minimize this risk, but is not liable for any 
damage 
you may sustain as a result of any virus in this e-mail. You should carry out 
your 
own virus checks before opening the e-mail or attachment. Infosys reserves the 
right to monitor and review the content of all messages sent to or from this 
e-mail 
address. Messages sent to or from this e-mail address may be stored on the 
Infosys e-mail system.
***INFOSYS******** End of Disclaimer ********INFOSYS***

Reply via email to