Re: Performance potential for updating (reindexing) documents

Shawn Heisey Sat, 02 Apr 2016 14:06:06 -0700

On 4/1/2016 8:56 PM, Erick Erickson wrote:
> bq: The bottleneck is definitely Solr.
>
> Since you commented out the server.add(doclist), you're right to focus
> there. I've seen
> a few things that help.
>
> 1> batch the documents, i.e. in the doclist above the list should be
> on the order of 1,000 docs. Here
> are some numbers I worked up one time:
> https://lucidworks.com/blog/2015/10/05/really-batch-updates-solr-2/


For that test, I was just seeing how fast MySQL could push data.  Based
on the results I saw from a small-scale test where I *did* add them,
letting the code run the add on the entire database with a single thread
would have taken forever.  I'm aware of the need to batch -- the code
did create batches, it just didn't send them.

I have a couple of ideas for the design on a multi-threaded indexing
program, but haven't worked out how to implement it.

> 3> Make sure you're using CloudSolrClient.

It's not SolrCloud, so that wouldn't really be helpful. :)

Thanks,
Shawn

Re: Performance potential for updating (reindexing) documents

Reply via email to