Re: Solrj : ConcurrentUpdateSolrClient based on QueueSize and Time

Jason Gerlowski Wed, 21 Feb 2018 18:49:34 -0800

My apologies Santosh.  I added that comment a few releases back based
on a misunderstanding I've only recently been disabused of.  I will
correct it.


Anyway, Shawn's explanation above is correct.  The queueSize parameter
doesn't control batching, as he clarified.  Sorry for the trouble.

Best,

Jason

On Wed, Feb 21, 2018 at 8:50 PM, Santosh Narayan
<santosh.narayan....@gmail.com> wrote:
> Thanks for the explanation Shawn. Very helpful. I think I got misled by the
> JavaDoc text for
> *ConcurrentUpdateSolrClient.Builder.withQueueSize*
>     /**
>      * The number of documents to batch together before sending to Solr. If
> not set, this defaults to 10.
>      */
>     public Builder withQueueSize(int queueSize) {
>       if (queueSize <= 0) {
>         throw new IllegalArgumentException("queueSize must be a positive
> integer.");
>       }
>       this.queueSize = queueSize;
>       return this;
>     }
>
>
>
> On Thu, Feb 22, 2018 at 9:41 AM, Shawn Heisey <apa...@elyograg.org> wrote:
>
>> On 2/21/2018 7:41 AM, Santosh Narayan wrote:
>> > May be it is my understanding of the documentation. As per the
>> > JavaDoc, ConcurrentUpdateSolrClient
>> > buffers all added documents and writes them into open HTTP connections.
>> >
>> > So I thought that this class would buffer documents in the client side
>> > itself till the QueueSize is reached and then send all the cached
>> documents
>> > together in one HTTP request. Is this not the case?
>>
>> That's not how it's designed.
>>
>> What ConcurrentUpdateSolrClient does differently than HttpSolrClient or
>> CloudSolrClient is return control immediately to your program when you
>> send an update, and begin processing that update in the background.  If
>> you send a LOT of updates very quickly, then the queue will get larger,
>> and will typically be processed in parallel by multiple threads.  The
>> client won't wait for the queue to fill.  Processing of the first update
>> you send should begin right after you add it.
>>
>> Something to consider:  Because control is returned to your program
>> immediately, and the response is always a success, your program will
>> never be informed about any problems with your adds when you use the
>> concurrent client.  The concurrent client is a great choice for initial
>> bulk indexing, because it offers multi-threaded indexing without any
>> need to handle the threads yourself.  But you don't get any kind of
>> error handling.
>>
>> Thanks,
>> Shawn
>>
>>

Re: Solrj : ConcurrentUpdateSolrClient based on QueueSize and Time

Reply via email to