It sounds like you want a data warehouse, not a text search engine. Splunk and Pentaho are good things to try.
On Thu, Apr 29, 2010 at 12:03 PM, Jon Baer <jonb...@gmail.com> wrote: > To follow up it ... it seems dumping to Solr is common ... > > http://highscalability.com/how-rackspace-now-uses-mapreduce-and-hadoop-query-terabytes-data > > - Jon > > On Apr 29, 2010, at 1:58 PM, Jon Baer wrote: > >> Good question, +1 on finding answer, my take ... >> >> Depending on how large of log files you are talking about it might be better >> off to do this w/ HDFS / Hadoop (and a script language like Pig) (or Amazon >> EMR) >> >> http://developer.amazonwebservices.com/connect/entry.jspa?externalID=873 >> >> Theoretically you could split the logs to fields, use a dataimporter and >> search / sort w/ something like LineEntityProcessor. >> >> http://wiki.apache.org/solr/DataImportHandler#LineEntityProcessor >> >> I've tried to use Solr as a log analytics tool (before dataimporthandler) >> and it was not worth the disk space or practical but I'd love to hear >> otherwise. In general you could flush daily logs to an index but working w/ >> the data in another context if you had to seems better fit for HDFS use (I >> think). >> >> - Jon >> >> On Apr 29, 2010, at 1:46 PM, Stefan Maric wrote: >> >>> >>> I thought i remembered seeing some information about this, but have been >>> unable to find it >>> >>> Does anyone know if there is a configuration / module that would allow us to >>> setup Solr to take in the (large) log files generated by our web/app >>> servers, so that we can query for things like peak time requests or most >>> frequently requested web page etc >>> >>> Thanks >>> Stefan Maric >>> >> > > -- Lance Norskog goks...@gmail.com