>From what I can understand, you have little full-text search involved here.
You should probably look at Hadoop and its contrib and sub-projects such as
Pig, Hive and Chukwa.

http://wiki.apache.org/hadoop/
http://wiki.apache.org/hadoop/Hive
http://wiki.apache.org/hadoop/Chukwa
http://incubator.apache.org/pig/

On Fri, Nov 7, 2008 at 9:03 PM, souravm <[EMAIL PROTECTED]> wrote:

> Hi Guys,
>
> Here I'm struggling with to decide whether Solr would be a fitting solution
> for me. Highly appreciate you
>
> The key requirements can be summarized as below -
>
> 1. Need to process very high volume of data online from log files of
> various applications - around 100s of Millions of total size may be varying
> within a range of 30-40 GB.
>
> 2. Flexibility - Log file formats from different applications would be
> different. Also for the same application log file formats can vary. However,
> the log files would be in xml and if a new type has to be supported then the
> schema for the same would be known before hand.
>
> 3. The type of queries to be supported -
> a) Mostly aggregation type statistics (min, max, average, sd, count etc.)
> of response times, sales numbers etc.
> b) Ability to support adhoc queries relating multiple fields in a given
> logfile, joining similar fields in multiple logfiles
>
> 4. Flexibility - Log file formats from different applications would be
> different. Also for the same application log file formats can vary. However,
> the log files would be in xml and if a new type has to be supported then the
> schema for the same would be known before hand.
>
> 5. Expected performance would be around 10 to 20 sec for majority of the
> queries. For rest it may be a bit more higher.
>
> I'm planning to use Solr with multicore and distributed search feature.
> However also considering Hadoop with Hbase as that looks to be a natural
> solution to support multiple file formats and handling adhoc queries.
>
> I would surely like to have your viewpoints on this regard - whether given
> the key requirements above Solr is a right choice or Hadoop+HBase would be
> better (or any other open source product).
>
> Thanks in advance.
>
> Regards,
> Sourav
>
> **************** CAUTION - Disclaimer *****************
> This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended
> solely
> for the use of the addressee(s). If you are not the intended recipient,
> please
> notify the sender by e-mail and delete the original message. Further, you
> are not
> to copy, disclose, or distribute this e-mail or its contents to any other
> person and
> any such actions are unlawful. This e-mail may contain viruses. Infosys has
> taken
> every reasonable precaution to minimize this risk, but is not liable for
> any damage
> you may sustain as a result of any virus in this e-mail. You should carry
> out your
> own virus checks before opening the e-mail or attachment. Infosys reserves
> the
> right to monitor and review the content of all messages sent to or from
> this e-mail
> address. Messages sent to or from this e-mail address may be stored on the
> Infosys e-mail system.
> ***INFOSYS******** End of Disclaimer ********INFOSYS***
>



-- 
Regards,
Shalin Shekhar Mangar.

Reply via email to