It depends on the document. In a e-commerce search, you might want to fail 
immediately and be notified. That is what we do, fail, rollback, and notify.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Oct 6, 2015, at 7:58 AM, Alessandro Benedetti <benedetti.ale...@gmail.com> 
> wrote:
> 
> mmmmmm one broken document in a batch should not break the entire batch ,
> right ( whatever approach used) ?
> Are you referring to the fact that you want to programmatically re-index
> the broken docs ?
> 
> Would be interesting to return the id of the broken docs along with the
> solr update response!
> 
> Cheers
> 
> 
> On 6 October 2015 at 15:30, Bill Dueber <b...@dueber.com> wrote:
> 
>> Just to add...my informal tests show that batching has waaaaay more effect
>> than solrj vs json.
>> 
>> I haven't look at CUSC in a while, last time I looked it was impossible to
>> do anything smart about error handling, so check that out before you get
>> too deeply into it. We use a strategy of sending a batch of json documents,
>> and if it returns an error sending each record one at a time until we find
>> the bad one and can log something useful.
>> 
>> 
>> 
>> On Mon, Oct 5, 2015 at 12:07 PM, Alessandro Benedetti <
>> benedetti.ale...@gmail.com> wrote:
>> 
>>> Thanks Erick,
>>> you confirmed my impressions!
>>> Thank you very much for the insights, an other opinion is welcome :)
>>> 
>>> Cheers
>>> 
>>> 2015-10-05 14:55 GMT+01:00 Erick Erickson <erickerick...@gmail.com>:
>>> 
>>>> SolrJ tends to be faster for several reasons, not the least of which
>>>> is that it sends packets to Solr in a more efficient binary format.
>>>> 
>>>> Batching is critical. I did some rough tests using SolrJ and sending
>>>> docs one at a time gave a throughput of < 400 docs/second.
>>>> Sending 10 gave 2,300 or so. Sending 100 at a time gave
>>>> over 5,300 docs/second. Curiously, 1,000 at a time gave only
>>>> marginal improvement over 100. This was with a single thread.
>>>> YMMV of course.
>>>> 
>>>> CloudSolrClient is definitely the better way to go with SolrCloud,
>>>> it routes the docs to the correct leader instead of having the
>>>> node you send the docs to do the routing.
>>>> 
>>>> Best,
>>>> Erick
>>>> 
>>>> On Mon, Oct 5, 2015 at 4:57 AM, Alessandro Benedetti
>>>> <abenede...@apache.org> wrote:
>>>>> I was doing some studies and analysis, just wondering in your opinion
>>>> which
>>>>> one is the best approach to use to index in Solr to reach the best
>>>>> throughput possible.
>>>>> I know that a lot of factor are affecting Indexing time, so let's
>> only
>>>>> focus in the feeding approach.
>>>>> Let's isolate different scenarios :
>>>>> 
>>>>> *Single Solr Infrastructure*
>>>>> 
>>>>> 1) Xml/Json batch request to /update IndexHandler (xml/json)
>>>>> 
>>>>> 2) SolrJ ConcurrentUpdateSolrClient ( javabin)
>>>>> I was thinking this to be the fastest approach for a multi threaded
>>>>> indexing application.
>>>>> Posting batch of docs if possible per request.
>>>>> 
>>>>> *Solr Cloud*
>>>>> 
>>>>> 1) Xml/Json batch request to /update IndexHandler(xml/json)
>>>>> 
>>>>> 2) SolrJ ConcurrentUpdateSolrClient ( javabin)
>>>>> 
>>>>> 3) CloudSolrClient ( javabin)
>>>>> it seems the best approach accordingly to this improvements [1]
>>>>> 
>>>>> What are your opinions ?
>>>>> 
>>>>> A bonus observation should be for using some Map/Reduce big data
>>> indexer,
>>>>> but let's assume we don't have a big cluster of cpus, but the average
>>>>> Indexer server.
>>>>> 
>>>>> 
>>>>> [1]
>>>>> 
>>>> 
>>> 
>> https://lucidworks.com/blog/indexing-performance-solr-5-2-now-twice-fast/
>>>>> 
>>>>> 
>>>>> Cheers
>>>>> 
>>>>> 
>>>>> --
>>>>> --------------------------
>>>>> 
>>>>> Benedetti Alessandro
>>>>> Visiting card : http://about.me/alessandro_benedetti
>>>>> 
>>>>> "Tyger, tyger burning bright
>>>>> In the forests of the night,
>>>>> What immortal hand or eye
>>>>> Could frame thy fearful symmetry?"
>>>>> 
>>>>> William Blake - Songs of Experience -1794 England
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> --------------------------
>>> 
>>> Benedetti Alessandro
>>> Visiting card - http://about.me/alessandro_benedetti
>>> Blog - http://alexbenedetti.blogspot.co.uk
>>> 
>>> "Tyger, tyger burning bright
>>> In the forests of the night,
>>> What immortal hand or eye
>>> Could frame thy fearful symmetry?"
>>> 
>>> William Blake - Songs of Experience -1794 England
>>> 
>> 
>> 
>> 
>> --
>> Bill Dueber
>> Library Systems Programmer
>> University of Michigan Library
>> 
> 
> 
> 
> -- 
> --------------------------
> 
> Benedetti Alessandro
> Visiting card - http://about.me/alessandro_benedetti
> Blog - http://alexbenedetti.blogspot.co.uk
> 
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
> 
> William Blake - Songs of Experience -1794 England

Reply via email to