On 4/1/2016 8:56 PM, Erick Erickson wrote: > bq: The bottleneck is definitely Solr. > > Since you commented out the server.add(doclist), you're right to focus > there. I've seen > a few things that help. > > 1> batch the documents, i.e. in the doclist above the list should be > on the order of 1,000 docs. Here > are some numbers I worked up one time: > https://lucidworks.com/blog/2015/10/05/really-batch-updates-solr-2/
For that test, I was just seeing how fast MySQL could push data. Based on the results I saw from a small-scale test where I *did* add them, letting the code run the add on the entire database with a single thread would have taken forever. I'm aware of the need to batch -- the code did create batches, it just didn't send them. I have a couple of ideas for the design on a multi-threaded indexing program, but haven't worked out how to implement it. > 3> Make sure you're using CloudSolrClient. It's not SolrCloud, so that wouldn't really be helpful. :) Thanks, Shawn