This is beyond my direct area of expertise, but one way to look at this would be: 1) Create new collections offline. Down to each of the 6000 clients having its own private collection (embedded SolrJ/server). Or some sort of mini-hubs, e.g. a server per N clients. 2) Bring those collections into central server 3) Update alias that used to point to previous collection set to point to the new one: https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-CreateormodifyanAliasforaCollection 4) Delete old collection set, as nothing points at it anymore
Now, I don't know how that would play with shards/replicas. Regards, Alex. ---- Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 15 August 2015 at 16:03, Troy Edwards <tedwards415...@gmail.com> wrote: > I am using SolrCloud > > My initial requirements are: > > 1) There are about 6000 clients > 2) The number of documents from each client are about 500000 (average > document size is about 400 bytes) > 3 I have to wipe off the index/collection every night and create new > > Any thoughts/ideas/suggestions on: > > 1) How to index such large number of documents i.e. do I use an http client > to send documents or is data import handler right or should I try uploading > CSV files? > > 2) How many collections should I use? > > 3) How many shards / replicas per collection should I use? > > 4) Do I need multiple Solr servers? > > Thanks