I'm about to do a prototype deployment of Solr for a pretty high-volume site, and I've been following this thread with some interest.
One thing I want to confirm: It's really possible for Solr to handle a constant stream of 10K updates/min (>150 updates/sec) to a 25M-document index? I new Solr and Lucene were good, but that seems like a pretty tall order. From the responses I'm seeing to David Whalen's inquiries, it seems like people think that's possible. Thanks, Charlie On 10/9/07, Matthew Runo <[EMAIL PROTECTED]> wrote: > The way I'd do it would be to buy more servers, set up Tomcat on > each, and get SOLR replicating from your current machine to the > others. Then, throw them all behind a load balancer, and there you go. > > You could also post your updates to every machine. Then you don't > need to worry about getting replication running. > > +--------------------------------------------------------+ > | Matthew Runo > | Zappos Development > | [EMAIL PROTECTED] > | 702-943-7833 > +--------------------------------------------------------+ > > > On Oct 9, 2007, at 7:12 AM, David Whalen wrote: > > > All: > > > > How can I break up my install onto more than one box? We've > > hit a learning curve here and we don't understand how best to > > proceed. Right now we have everything crammed onto one box > > because we don't know any better. > > > > So, how would you build it if you could? Here are the specs: > > > > a) the index needs to hold at least 25 million articles > > b) the index is constantly updated at a rate of 10,000 articles > > per minute > > c) we need to have faceted queries > > > > Again, real-world experience is preferred here over book knowledge. > > We've tried to read the docs and it's only made us more confused. > > > > TIA > > > > Dave W > > > > > >> -----Original Message----- > >> From: Yonik Seeley [mailto:[EMAIL PROTECTED] > >> Sent: Monday, October 08, 2007 3:42 PM > >> To: solr-user@lucene.apache.org > >> Subject: Re: Availability Issues > >> > >> On 10/8/07, David Whalen <[EMAIL PROTECTED]> wrote: > >>>> Do you see any requests that took a really long time to finish? > >>> > >>> The requests that take a long time to finish are just > >> simple queries. > >>> And the same queries run at a later time come back much faster. > >>> > >>> Our logs contain 99% inserts and 1% queries. We are > >> constantly adding > >>> documents to the index at a rate of 10,000 per minute, so the logs > >>> show mostly that. > >> > >> Oh, so you are using the same boxes for updating and querying? > >> When you insert, are you using multiple threads? If so, how many? > >> > >> What is the full URL of those slow query requests? > >> Do the slow requests start after a commit? > >> > >>>> Start with the thread dump. > >>>> I bet it's multiple queries piling up around some synchronization > >>>> points in lucene (sometimes caused by multiple threads generating > >>>> the same big filter that isn't yet cached). > >>> > >>> What would be my next steps after that? I'm not sure I'd > >> understand > >>> enough from the dump to make heads-or-tails of it. Can I > >> share that > >>> here? > >> > >> Yes, post it here. Most likely a majority of the threads > >> will be blocked somewhere deep in lucene code, and you will > >> probably need help from people here to figure it out. > >> > >> -Yonik > >> > >> > > > >