Re: Best Indexing Approaches - To max the throughput

2015-10-09 Thread Alessandro Benedetti
J with Thread pool executor framework, increase > number > > > of > > > Thread as per your requirement > > > > > > > > > > > > -- > > > View this message in context: > > >

Re: Best Indexing Approaches - To max the throughput

2015-10-08 Thread Susheel Kumar
; Good way Using SolrJ with Thread pool executor framework, increase number > > of > > Thread as per your requirement > > > > > > > > -- > > View this message in context: > > > http://lucene.472066.n3.nabble.com/Best-Indexing-Approaches-To-max-the

Re: Best Indexing Approaches - To max the throughput

2015-10-08 Thread Alessandro Benedetti
ng SolrJ with Thread pool executor framework, increase number > of > Thread as per your requirement > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Best-Indexing-Approaches-To-max-the-throughput-tp4232740p4233513.html > Sent from the Solr - User

Re: Best Indexing Approaches - To max the throughput

2015-10-08 Thread Mugeesh Husain
Good way Using SolrJ with Thread pool executor framework, increase number of Thread as per your requirement -- View this message in context: http://lucene.472066.n3.nabble.com/Best-Indexing-Approaches-To-max-the-throughput-tp4232740p4233513.html Sent from the Solr - User mailing list archive

Re: Best Indexing Approaches - To max the throughput

2015-10-06 Thread Gili Nachum
CloudSolrServer Beyond sending documents to the right leader shard, it also do this in *parallel *(for a batch), employing its own thread pool, with a connection per shard. On Tue, Oct 6, 2015 at 8:15 PM, Walter Underwood wrote: > This is at Cheg

Re: Best Indexing Approaches - To max the throughput

2015-10-06 Thread Walter Underwood
This is at Chegg. One of our indexes is textbooks. These are expensive and don’t change very often. It is better to keep yesterday’s index than to drop a few important books. We have occasionally had an error that happens with every book, like a new field that is not in the Solr schema. If we i

Re: Best Indexing Approaches - To max the throughput

2015-10-06 Thread Alessandro Benedetti
Hi Walter, can you explain better your use case ? You index a batch of e-commerce products ( Solr documents) if one fails, you want to stop and invalidate the entire batch ( using the almost never used solr rollback, or manual deletion ?) And then log the exception indexing size. To then re-index t

Re: Best Indexing Approaches - To max the throughput

2015-10-06 Thread Walter Underwood
It depends on the document. In a e-commerce search, you might want to fail immediately and be notified. That is what we do, fail, rollback, and notify. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Oct 6, 2015, at 7:58 AM, Alessandro Benedetti >

Re: Best Indexing Approaches - To max the throughput

2015-10-06 Thread Alessandro Benedetti
mm one broken document in a batch should not break the entire batch , right ( whatever approach used) ? Are you referring to the fact that you want to programmatically re-index the broken docs ? Would be interesting to return the id of the broken docs along with the solr update response! Chee

Re: Best Indexing Approaches - To max the throughput

2015-10-06 Thread Bill Dueber
Just to add...my informal tests show that batching has way more effect than solrj vs json. I haven't look at CUSC in a while, last time I looked it was impossible to do anything smart about error handling, so check that out before you get too deeply into it. We use a strategy of sending a batc

Re: Best Indexing Approaches - To max the throughput

2015-10-05 Thread Alessandro Benedetti
Thanks Erick, you confirmed my impressions! Thank you very much for the insights, an other opinion is welcome :) Cheers 2015-10-05 14:55 GMT+01:00 Erick Erickson : > SolrJ tends to be faster for several reasons, not the least of which > is that it sends packets to Solr in a more efficient binary

Re: Best Indexing Approaches - To max the throughput

2015-10-05 Thread Erick Erickson
SolrJ tends to be faster for several reasons, not the least of which is that it sends packets to Solr in a more efficient binary format. Batching is critical. I did some rough tests using SolrJ and sending docs one at a time gave a throughput of < 400 docs/second. Sending 10 gave 2,300 or so. Send

Best Indexing Approaches - To max the throughput

2015-10-05 Thread Alessandro Benedetti
I was doing some studies and analysis, just wondering in your opinion which one is the best approach to use to index in Solr to reach the best throughput possible. I know that a lot of factor are affecting Indexing time, so let's only focus in the feeding approach. Let's isolate different scenarios