I'm about to do a prototype deployment of Solr for a pretty
high-volume site, and I've been following this thread with some
interest.

One thing I want to confirm: It's really possible for Solr to handle a
constant stream of 10K updates/min (>150 updates/sec) to a
25M-document index? I new Solr and Lucene were good, but that seems
like a pretty tall order. From the responses I'm seeing to David
Whalen's inquiries, it seems like people think that's possible.

Thanks,
Charlie

On 10/9/07, Matthew Runo <[EMAIL PROTECTED]> wrote:
> The way I'd do it would be to buy more servers, set up Tomcat on
> each, and get SOLR replicating from your current machine to the
> others. Then, throw them all behind a load balancer, and there you go.
>
> You could also post your updates to every machine. Then you don't
> need to worry about getting replication running.
>
> +--------------------------------------------------------+
>   | Matthew Runo
>   | Zappos Development
>   | [EMAIL PROTECTED]
>   | 702-943-7833
> +--------------------------------------------------------+
>
>
> On Oct 9, 2007, at 7:12 AM, David Whalen wrote:
>
> > All:
> >
> > How can I break up my install onto more than one box?  We've
> > hit a learning curve here and we don't understand how best to
> > proceed.  Right now we have everything crammed onto one box
> > because we don't know any better.
> >
> > So, how would you build it if you could?  Here are the specs:
> >
> > a) the index needs to hold at least 25 million articles
> > b) the index is constantly updated at a rate of 10,000 articles
> > per minute
> > c) we need to have faceted queries
> >
> > Again, real-world experience is preferred here over book knowledge.
> > We've tried to read the docs and it's only made us more confused.
> >
> > TIA
> >
> > Dave W
> >
> >
> >> -----Original Message-----
> >> From: Yonik Seeley [mailto:[EMAIL PROTECTED]
> >> Sent: Monday, October 08, 2007 3:42 PM
> >> To: solr-user@lucene.apache.org
> >> Subject: Re: Availability Issues
> >>
> >> On 10/8/07, David Whalen <[EMAIL PROTECTED]> wrote:
> >>>> Do you see any requests that took a really long time to finish?
> >>>
> >>> The requests that take a long time to finish are just
> >> simple queries.
> >>> And the same queries run at a later time come back much faster.
> >>>
> >>> Our logs contain 99% inserts and 1% queries.  We are
> >> constantly adding
> >>> documents to the index at a rate of 10,000 per minute, so the logs
> >>> show mostly that.
> >>
> >> Oh, so you are using the same boxes for updating and querying?
> >> When you insert, are you using multiple threads?  If so, how many?
> >>
> >> What is the full URL of those slow query requests?
> >> Do the slow requests start after a commit?
> >>
> >>>> Start with the thread dump.
> >>>> I bet it's multiple queries piling up around some synchronization
> >>>> points in lucene (sometimes caused by multiple threads generating
> >>>> the same big filter that isn't yet cached).
> >>>
> >>> What would be my next steps after that?  I'm not sure I'd
> >> understand
> >>> enough from the dump to make heads-or-tails of it.  Can I
> >> share that
> >>> here?
> >>
> >> Yes, post it here.  Most likely a majority of the threads
> >> will be blocked somewhere deep in lucene code, and you will
> >> probably need help from people here to figure it out.
> >>
> >> -Yonik
> >>
> >>
> >
>
>

Reply via email to