It depends on the document. In a e-commerce search, you might want to fail immediately and be notified. That is what we do, fail, rollback, and notify.
wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Oct 6, 2015, at 7:58 AM, Alessandro Benedetti <benedetti.ale...@gmail.com> > wrote: > > mmmmmm one broken document in a batch should not break the entire batch , > right ( whatever approach used) ? > Are you referring to the fact that you want to programmatically re-index > the broken docs ? > > Would be interesting to return the id of the broken docs along with the > solr update response! > > Cheers > > > On 6 October 2015 at 15:30, Bill Dueber <b...@dueber.com> wrote: > >> Just to add...my informal tests show that batching has waaaaay more effect >> than solrj vs json. >> >> I haven't look at CUSC in a while, last time I looked it was impossible to >> do anything smart about error handling, so check that out before you get >> too deeply into it. We use a strategy of sending a batch of json documents, >> and if it returns an error sending each record one at a time until we find >> the bad one and can log something useful. >> >> >> >> On Mon, Oct 5, 2015 at 12:07 PM, Alessandro Benedetti < >> benedetti.ale...@gmail.com> wrote: >> >>> Thanks Erick, >>> you confirmed my impressions! >>> Thank you very much for the insights, an other opinion is welcome :) >>> >>> Cheers >>> >>> 2015-10-05 14:55 GMT+01:00 Erick Erickson <erickerick...@gmail.com>: >>> >>>> SolrJ tends to be faster for several reasons, not the least of which >>>> is that it sends packets to Solr in a more efficient binary format. >>>> >>>> Batching is critical. I did some rough tests using SolrJ and sending >>>> docs one at a time gave a throughput of < 400 docs/second. >>>> Sending 10 gave 2,300 or so. Sending 100 at a time gave >>>> over 5,300 docs/second. Curiously, 1,000 at a time gave only >>>> marginal improvement over 100. This was with a single thread. >>>> YMMV of course. >>>> >>>> CloudSolrClient is definitely the better way to go with SolrCloud, >>>> it routes the docs to the correct leader instead of having the >>>> node you send the docs to do the routing. >>>> >>>> Best, >>>> Erick >>>> >>>> On Mon, Oct 5, 2015 at 4:57 AM, Alessandro Benedetti >>>> <abenede...@apache.org> wrote: >>>>> I was doing some studies and analysis, just wondering in your opinion >>>> which >>>>> one is the best approach to use to index in Solr to reach the best >>>>> throughput possible. >>>>> I know that a lot of factor are affecting Indexing time, so let's >> only >>>>> focus in the feeding approach. >>>>> Let's isolate different scenarios : >>>>> >>>>> *Single Solr Infrastructure* >>>>> >>>>> 1) Xml/Json batch request to /update IndexHandler (xml/json) >>>>> >>>>> 2) SolrJ ConcurrentUpdateSolrClient ( javabin) >>>>> I was thinking this to be the fastest approach for a multi threaded >>>>> indexing application. >>>>> Posting batch of docs if possible per request. >>>>> >>>>> *Solr Cloud* >>>>> >>>>> 1) Xml/Json batch request to /update IndexHandler(xml/json) >>>>> >>>>> 2) SolrJ ConcurrentUpdateSolrClient ( javabin) >>>>> >>>>> 3) CloudSolrClient ( javabin) >>>>> it seems the best approach accordingly to this improvements [1] >>>>> >>>>> What are your opinions ? >>>>> >>>>> A bonus observation should be for using some Map/Reduce big data >>> indexer, >>>>> but let's assume we don't have a big cluster of cpus, but the average >>>>> Indexer server. >>>>> >>>>> >>>>> [1] >>>>> >>>> >>> >> https://lucidworks.com/blog/indexing-performance-solr-5-2-now-twice-fast/ >>>>> >>>>> >>>>> Cheers >>>>> >>>>> >>>>> -- >>>>> -------------------------- >>>>> >>>>> Benedetti Alessandro >>>>> Visiting card : http://about.me/alessandro_benedetti >>>>> >>>>> "Tyger, tyger burning bright >>>>> In the forests of the night, >>>>> What immortal hand or eye >>>>> Could frame thy fearful symmetry?" >>>>> >>>>> William Blake - Songs of Experience -1794 England >>>> >>> >>> >>> >>> -- >>> -------------------------- >>> >>> Benedetti Alessandro >>> Visiting card - http://about.me/alessandro_benedetti >>> Blog - http://alexbenedetti.blogspot.co.uk >>> >>> "Tyger, tyger burning bright >>> In the forests of the night, >>> What immortal hand or eye >>> Could frame thy fearful symmetry?" >>> >>> William Blake - Songs of Experience -1794 England >>> >> >> >> >> -- >> Bill Dueber >> Library Systems Programmer >> University of Michigan Library >> > > > > -- > -------------------------- > > Benedetti Alessandro > Visiting card - http://about.me/alessandro_benedetti > Blog - http://alexbenedetti.blogspot.co.uk > > "Tyger, tyger burning bright > In the forests of the night, > What immortal hand or eye > Could frame thy fearful symmetry?" > > William Blake - Songs of Experience -1794 England