We do a lot of precisely this sort of thing. Ours is a commercial
product (Honeycomb Lexicon) that extracts behavioural information from
logs, events and network data (don't worry, I'm not pushing this on
you!) - only to say that there are a lot of considerations beyond base
Solr when it comes to handling log, event and other 'transient' data
streams.
Aside from the obvious issues of horizontal scaling, reliable
delivery/retry/replication etc., there are other important issues,
particularly with regards data classification, reporting engines and
numerous other items.
It's one of those things that's sounds perfectly reasonable at the
outset, but all sorts of things crop up the deeper you get into it.

Peter


On Tue, Nov 30, 2010 at 11:44 AM, phoey <pho...@gmail.com> wrote:
>
> We are looking into building a reporting feature and investigating solutions
> which will allow us to search though our logs for downloads, searches and
> view history.
>
> Each log item is relatively small
>
> download history
>
> <add>
>        <doc>
>                <field name="uuid">item123-v1</field>
>                <field name="market">photography</field>
>                <field name="name">item 1</field>
>                <field name="userid">1</field>
>                <field name="version">1</field>
>                <field name="downloadType">hires</field>
>                <field name="itemId">123</field>
>                <field name="timestamp">2009-11-07T14:50:54Z</field>
>        </doc>
> </add>
>
> search history
>
> <add>
>        <doc>
>                <field name="uuid">1</field>
>                <field name="query">brand assets</field>
>                <field name="userid">1</field>
>                <field name="timestamp">2009-11-07T14:50:54Z</field>
>        </doc>
> </add>
>
> view history
>
> <add>
>        <doc>
>                <field name="uuid">1</field>
>                <field name="itemId">123</field>
>                <field name="userid">1</field>
>                <field name="timestamp">2009-11-07T14:50:54Z</field>
>        </doc>
> </add>
>
>
> and we reckon that we could have around 10 - 30 million log records for each
> type (downloads, searches, views) so 70 million records in total but
> obviously must scale higher.
>
> concurrent users will be around 10 - 20 (relatively low)
>
> new logs will be imported as a batch overnight.
>
> Because we have some previous experience with SOLR and because the interface
> needs to have full-text searching and filtering we built a prototype using
> SOLR 4.0. We used the new field collapsing feature within SOLR 4.0 to
> collapse on groups of data. For example view History needs to collapse on
> itemId. Each row will then show the frequency on how many views the item has
> had. This is achieved by the number of items which have been grouped.
>
> The requirements for the solution is to be schemaless to allow adding new
> fields to new documents easier, and have a powerful search interface, both
> which SOLR can do.
>
> QUESTIONS
>
> Our prototype is working as expected but im unsure if
>
> 1. has anyone got experience with using SOLR for log analysis.
> 2. SOLR can scale but when is the limit when i should start considering
> about sharding the index. It should be fine with 100+ million records.
> 3. We are using a nightly build of SOLR for the "field collapsing" feature.
> Would it be possible to patch SOLR 1.4.1 with the SOLR-236 patch? has anyone
> used this in production?
>
> thanks
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/SOLR-for-Log-analysis-feasibility-tp1992202p1992202.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Reply via email to