Hi Erick, My post was scant on details. The numbers I gave for collection sizes are projections for the future. I am in the midst of an upgrade that will be completed within a few weeks. My concern is that I may not be able to produce the throughput necessary to index an entire collection quickly enough (3 to 4 hours) for a large customer (100M docs).
Currently: - single Solr instance on one host that is sharing memory and cpu with other applications - 4GB dedicated to Solr ~ 20M docs ~ 10GB index size - using HttpSolrClient for all queries and updates Very soon: - two VMs dedicated to Solr (2 nodes) - up to 16GB available memory - running in cloud mode, and can now scale horizontally - all collections are single sharded with 2 replicas All fields are stored. The scenario I gave is using atomic updates. The updates are done in large batches of 5000-10000 docs. The use case I have is different than most Solr setups perhaps. Indexing throughput is more important than qps. We have very few concurrent users that do massive amounts of doc updates. I am seeing lousy (production) performance currently (not a surprise - long GC pauses), and have just begun the process of tuning in a test environment. After some more weeks of testing and tweaking I hope to get to 5000 updates/sec, but even that may not be enough. So my main concern is that this business model (of updating entire collections about once a day) cannot be supported by Solr. -- View this message in context: http://lucene.472066.n3.nabble.com/Performance-potential-for-updating-reindexing-documents-tp4265861p4265922.html Sent from the Solr - User mailing list archive at Nabble.com.