With a properly tuned solr cloud infrastructure and less than 1B total docs
spread out over 50 collections where the largest collection is 100M docs,
what is a reasonable target goal for entirely reindexing a single
collection?

I understand there are a lot of variables, so I'm hypothetically wiping them
away by assuming "a properly tuned infrastructure". So the hardware, RAM,
etc. is configured correctly (not so in my case).

The scenario is to add 3 fields to all the existing docs in one collection.
The fields are the same but the values vary based on the docs. So a search
is performed and finds 100 matches - all 100 docs will get the same updates.
Then another search is performed that matches 15000 docs, and these are
updated. This continues 10-20,000 times until essentially all the docs have
been updated.

The docs all have 100 - 200 fields, mostly text and mostly small in size.
What's the best possible throughput I can expect? 1000 docs/sec? 5000
docs/sec?

Using SolrJ for querying and indexing against a v5.2.1 cloud.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Performance-potential-for-updating-reindexing-documents-tp4265861.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to