With a properly tuned solr cloud infrastructure and less than 1B total docs spread out over 50 collections where the largest collection is 100M docs, what is a reasonable target goal for entirely reindexing a single collection?
I understand there are a lot of variables, so I'm hypothetically wiping them away by assuming "a properly tuned infrastructure". So the hardware, RAM, etc. is configured correctly (not so in my case). The scenario is to add 3 fields to all the existing docs in one collection. The fields are the same but the values vary based on the docs. So a search is performed and finds 100 matches - all 100 docs will get the same updates. Then another search is performed that matches 15000 docs, and these are updated. This continues 10-20,000 times until essentially all the docs have been updated. The docs all have 100 - 200 fields, mostly text and mostly small in size. What's the best possible throughput I can expect? 1000 docs/sec? 5000 docs/sec? Using SolrJ for querying and indexing against a v5.2.1 cloud. -- View this message in context: http://lucene.472066.n3.nabble.com/Performance-potential-for-updating-reindexing-documents-tp4265861.html Sent from the Solr - User mailing list archive at Nabble.com.