Troy Edwards <tedwards415...@gmail.com> wrote: > 1) There are about 6000 clients > 2) The number of documents from each client are about 500000 (average > document size is about 400 bytes)
So roughly 3 billion documents / 1TB index size. So at least 2 shards, due to the 2 billion limit in Lucene. If you want more advice than that, you will have to describe how the setup is to be used - How many requests persecond? - What is a typical query? - How low does the response time needs to be? > 3 I have to wipe off the index/collection every night and create new Let's say you have 4 hours to do that. That's about 200K documents/second you need to index. That is a high number and with such tiny documents, I suspect that logistics might take up the largest part of that. This might call for multiple independent setups. > 1) How to index such large number of documents i.e. do I use an http client > to send documents or is data import handler right or should I try uploading > CSV files? As the overhead for constructing and parsing XML documents is not trivial, CSV seems reasonable, Probably also DIH. > 2) How many collections should I use? Not 6000 in a single SolrCloud. > 3) How many shards / replicas per collection should I use? > 4) Do I need multiple Solr servers? Not enough data about index usage to say. Between 1 and 50, not kidding. - Toke Eskildsen