It sounds like you want a data warehouse, not a text search engine.
Splunk and Pentaho are good things to try.

On Thu, Apr 29, 2010 at 12:03 PM, Jon Baer <jonb...@gmail.com> wrote:
> To follow up it ... it seems dumping to Solr is common ...
>
> http://highscalability.com/how-rackspace-now-uses-mapreduce-and-hadoop-query-terabytes-data
>
> - Jon
>
> On Apr 29, 2010, at 1:58 PM, Jon Baer wrote:
>
>> Good question, +1 on finding answer, my take ...
>>
>> Depending on how large of log files you are talking about it might be better 
>> off to do this w/ HDFS / Hadoop (and a script language like Pig) (or Amazon 
>> EMR)
>>
>> http://developer.amazonwebservices.com/connect/entry.jspa?externalID=873
>>
>> Theoretically you could split the logs to fields, use a dataimporter and 
>> search / sort w/ something like LineEntityProcessor.
>>
>> http://wiki.apache.org/solr/DataImportHandler#LineEntityProcessor
>>
>> I've tried to use Solr as a log analytics tool (before dataimporthandler) 
>> and it was not worth the disk space or practical but I'd love to hear 
>> otherwise.  In general you could flush daily logs to an index but working w/ 
>> the data in another context if you had to seems better fit for HDFS use (I 
>> think).
>>
>> - Jon
>>
>> On Apr 29, 2010, at 1:46 PM, Stefan Maric wrote:
>>
>>>
>>> I thought i remembered seeing some information about this, but have been
>>> unable to find it
>>>
>>> Does anyone know if there is a configuration / module that would allow us to
>>> setup Solr to take in the (large) log files generated by our web/app
>>> servers, so that we can query for things like peak time requests or most
>>> frequently requested web page etc
>>>
>>> Thanks
>>> Stefan Maric
>>>
>>
>
>



-- 
Lance Norskog
goks...@gmail.com

Reply via email to