i know, it's not solr .. but perhaps you should have a look at it: http://www.cloudera.com/blog/2010/09/using-flume-to-collect-apache-2-web-server-logs/
On Tue, Nov 30, 2010 at 12:58 PM, Peter Karich <peat...@yahoo.de> wrote: > take a look into this: > http://vimeo.com/16102543 > > for that amount of data it isn't that easy :-) > > > We are looking into building a reporting feature and investigating >> solutions >> which will allow us to search though our logs for downloads, searches and >> view history. >> >> Each log item is relatively small >> >> download history >> >> <add> >> <doc> >> <field name="uuid">item123-v1</field> >> <field name="market">photography</field> >> <field name="name">item 1</field> >> <field name="userid">1</field> >> <field name="version">1</field> >> <field name="downloadType">hires</field> >> <field name="itemId">123</field> >> <field name="timestamp">2009-11-07T14:50:54Z</field> >> </doc> >> </add> >> >> search history >> >> <add> >> <doc> >> <field name="uuid">1</field> >> <field name="query">brand assets</field> >> <field name="userid">1</field> >> <field name="timestamp">2009-11-07T14:50:54Z</field> >> </doc> >> </add> >> >> view history >> >> <add> >> <doc> >> <field name="uuid">1</field> >> <field name="itemId">123</field> >> <field name="userid">1</field> >> <field name="timestamp">2009-11-07T14:50:54Z</field> >> </doc> >> </add> >> >> >> and we reckon that we could have around 10 - 30 million log records for >> each >> type (downloads, searches, views) so 70 million records in total but >> obviously must scale higher. >> >> concurrent users will be around 10 - 20 (relatively low) >> >> new logs will be imported as a batch overnight. >> >> Because we have some previous experience with SOLR and because the >> interface >> needs to have full-text searching and filtering we built a prototype using >> SOLR 4.0. We used the new field collapsing feature within SOLR 4.0 to >> collapse on groups of data. For example view History needs to collapse on >> itemId. Each row will then show the frequency on how many views the item >> has >> had. This is achieved by the number of items which have been grouped. >> >> The requirements for the solution is to be schemaless to allow adding new >> fields to new documents easier, and have a powerful search interface, both >> which SOLR can do. >> >> QUESTIONS >> >> Our prototype is working as expected but im unsure if >> >> 1. has anyone got experience with using SOLR for log analysis. >> 2. SOLR can scale but when is the limit when i should start considering >> about sharding the index. It should be fine with 100+ million records. >> 3. We are using a nightly build of SOLR for the "field collapsing" >> feature. >> Would it be possible to patch SOLR 1.4.1 with the SOLR-236 patch? has >> anyone >> used this in production? >> >> thanks >> > > > -- > http://jetwick.com twitter search prototype > >