CloudSolrServer <https://issues.apache.org/jira/browse/SOLR-4816> Beyond sending documents to the right leader shard, it also do this in *parallel *(for a batch), employing its own thread pool, with a connection per shard.
On Tue, Oct 6, 2015 at 8:15 PM, Walter Underwood <wun...@wunderwood.org> wrote: > This is at Chegg. One of our indexes is textbooks. These are expensive and > don’t change very often. It is better to keep yesterday’s index than to > drop a few important books. > > We have occasionally had an error that happens with every book, like a new > field that is not in the Solr schema. If we ignored errors with that, we’d > have an empty index: delete all, add all (failing), commit. > > With the fail fast and rollback, we can catch problems before they mess up > the index. > > Also, to pinpoint isolated problems, if there is an error in the batch, it > re-submits that batch one at a time, so we get an accurate report of which > document was rejected. I wrote that same thing back at Netflix, before > SolrJ. > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > > > > On Oct 6, 2015, at 9:49 AM, Alessandro Benedetti < > benedetti.ale...@gmail.com> wrote: > > > > Hi Walter, > > can you explain better your use case ? > > You index a batch of e-commerce products ( Solr documents) if one fails, > > you want to stop and invalidate the entire batch ( using the almost never > > used solr rollback, or manual deletion ?) > > And then log the exception indexing size. > > To then re-index the whole batch od docs ? > > > > In this scenario, the ConcurrentUpdateSolrClient will not be ideal? > > Only curiosity. > > > > Cheers > > > > On 6 October 2015 at 17:29, Walter Underwood <wun...@wunderwood.org> > wrote: > > > >> It depends on the document. In a e-commerce search, you might want to > fail > >> immediately and be notified. That is what we do, fail, rollback, and > notify. > >> > >> wunder > >> Walter Underwood > >> wun...@wunderwood.org > >> http://observer.wunderwood.org/ (my blog) > >> > >> > >>> On Oct 6, 2015, at 7:58 AM, Alessandro Benedetti < > >> benedetti.ale...@gmail.com> wrote: > >>> > >>> mmmmmm one broken document in a batch should not break the entire > batch , > >>> right ( whatever approach used) ? > >>> Are you referring to the fact that you want to programmatically > re-index > >>> the broken docs ? > >>> > >>> Would be interesting to return the id of the broken docs along with the > >>> solr update response! > >>> > >>> Cheers > >>> > >>> > >>> On 6 October 2015 at 15:30, Bill Dueber <b...@dueber.com> wrote: > >>> > >>>> Just to add...my informal tests show that batching has waaaaay more > >> effect > >>>> than solrj vs json. > >>>> > >>>> I haven't look at CUSC in a while, last time I looked it was > impossible > >> to > >>>> do anything smart about error handling, so check that out before you > get > >>>> too deeply into it. We use a strategy of sending a batch of json > >> documents, > >>>> and if it returns an error sending each record one at a time until we > >> find > >>>> the bad one and can log something useful. > >>>> > >>>> > >>>> > >>>> On Mon, Oct 5, 2015 at 12:07 PM, Alessandro Benedetti < > >>>> benedetti.ale...@gmail.com> wrote: > >>>> > >>>>> Thanks Erick, > >>>>> you confirmed my impressions! > >>>>> Thank you very much for the insights, an other opinion is welcome :) > >>>>> > >>>>> Cheers > >>>>> > >>>>> 2015-10-05 14:55 GMT+01:00 Erick Erickson <erickerick...@gmail.com>: > >>>>> > >>>>>> SolrJ tends to be faster for several reasons, not the least of which > >>>>>> is that it sends packets to Solr in a more efficient binary format. > >>>>>> > >>>>>> Batching is critical. I did some rough tests using SolrJ and sending > >>>>>> docs one at a time gave a throughput of < 400 docs/second. > >>>>>> Sending 10 gave 2,300 or so. Sending 100 at a time gave > >>>>>> over 5,300 docs/second. Curiously, 1,000 at a time gave only > >>>>>> marginal improvement over 100. This was with a single thread. > >>>>>> YMMV of course. > >>>>>> > >>>>>> CloudSolrClient is definitely the better way to go with SolrCloud, > >>>>>> it routes the docs to the correct leader instead of having the > >>>>>> node you send the docs to do the routing. > >>>>>> > >>>>>> Best, > >>>>>> Erick > >>>>>> > >>>>>> On Mon, Oct 5, 2015 at 4:57 AM, Alessandro Benedetti > >>>>>> <abenede...@apache.org> wrote: > >>>>>>> I was doing some studies and analysis, just wondering in your > opinion > >>>>>> which > >>>>>>> one is the best approach to use to index in Solr to reach the best > >>>>>>> throughput possible. > >>>>>>> I know that a lot of factor are affecting Indexing time, so let's > >>>> only > >>>>>>> focus in the feeding approach. > >>>>>>> Let's isolate different scenarios : > >>>>>>> > >>>>>>> *Single Solr Infrastructure* > >>>>>>> > >>>>>>> 1) Xml/Json batch request to /update IndexHandler (xml/json) > >>>>>>> > >>>>>>> 2) SolrJ ConcurrentUpdateSolrClient ( javabin) > >>>>>>> I was thinking this to be the fastest approach for a multi threaded > >>>>>>> indexing application. > >>>>>>> Posting batch of docs if possible per request. > >>>>>>> > >>>>>>> *Solr Cloud* > >>>>>>> > >>>>>>> 1) Xml/Json batch request to /update IndexHandler(xml/json) > >>>>>>> > >>>>>>> 2) SolrJ ConcurrentUpdateSolrClient ( javabin) > >>>>>>> > >>>>>>> 3) CloudSolrClient ( javabin) > >>>>>>> it seems the best approach accordingly to this improvements [1] > >>>>>>> > >>>>>>> What are your opinions ? > >>>>>>> > >>>>>>> A bonus observation should be for using some Map/Reduce big data > >>>>> indexer, > >>>>>>> but let's assume we don't have a big cluster of cpus, but the > average > >>>>>>> Indexer server. > >>>>>>> > >>>>>>> > >>>>>>> [1] > >>>>>>> > >>>>>> > >>>>> > >>>> > >> > https://lucidworks.com/blog/indexing-performance-solr-5-2-now-twice-fast/ > >>>>>>> > >>>>>>> > >>>>>>> Cheers > >>>>>>> > >>>>>>> > >>>>>>> -- > >>>>>>> -------------------------- > >>>>>>> > >>>>>>> Benedetti Alessandro > >>>>>>> Visiting card : http://about.me/alessandro_benedetti > >>>>>>> > >>>>>>> "Tyger, tyger burning bright > >>>>>>> In the forests of the night, > >>>>>>> What immortal hand or eye > >>>>>>> Could frame thy fearful symmetry?" > >>>>>>> > >>>>>>> William Blake - Songs of Experience -1794 England > >>>>>> > >>>>> > >>>>> > >>>>> > >>>>> -- > >>>>> -------------------------- > >>>>> > >>>>> Benedetti Alessandro > >>>>> Visiting card - http://about.me/alessandro_benedetti > >>>>> Blog - http://alexbenedetti.blogspot.co.uk > >>>>> > >>>>> "Tyger, tyger burning bright > >>>>> In the forests of the night, > >>>>> What immortal hand or eye > >>>>> Could frame thy fearful symmetry?" > >>>>> > >>>>> William Blake - Songs of Experience -1794 England > >>>>> > >>>> > >>>> > >>>> > >>>> -- > >>>> Bill Dueber > >>>> Library Systems Programmer > >>>> University of Michigan Library > >>>> > >>> > >>> > >>> > >>> -- > >>> -------------------------- > >>> > >>> Benedetti Alessandro > >>> Visiting card - http://about.me/alessandro_benedetti > >>> Blog - http://alexbenedetti.blogspot.co.uk > >>> > >>> "Tyger, tyger burning bright > >>> In the forests of the night, > >>> What immortal hand or eye > >>> Could frame thy fearful symmetry?" > >>> > >>> William Blake - Songs of Experience -1794 England > >> > >> > > > > > > -- > > -------------------------- > > > > Benedetti Alessandro > > Visiting card - http://about.me/alessandro_benedetti > > Blog - http://alexbenedetti.blogspot.co.uk > > > > "Tyger, tyger burning bright > > In the forests of the night, > > What immortal hand or eye > > Could frame thy fearful symmetry?" > > > > William Blake - Songs of Experience -1794 England > >