Troy Edwards <tedwards415...@gmail.com> wrote:
> 1) There are about 6000 clients
> 2) The number of documents from each client are about 500000 (average
> document size is about 400 bytes)

So roughly 3 billion documents / 1TB index size. So at least 2 shards, due to 
the 2 billion limit in Lucene. If you want more advice than that, you will have 
to describe how the setup is to be used

- How many requests persecond?
- What is a typical query?
- How low does the response time needs to be?

> 3 I have to wipe off the index/collection every night and create new

Let's say you have 4 hours to do that. That's about 200K documents/second you 
need to index. That is a high number and with such tiny documents, I suspect 
that logistics might take up the largest part of that. This might call for 
multiple independent setups.

> 1) How to index such large number of documents i.e. do I use an http client
> to send documents or is data import handler right or should I try uploading
> CSV files?

As the overhead for constructing and parsing XML documents is not trivial, CSV 
seems reasonable, Probably also DIH.

> 2) How many collections should I use?

Not 6000 in a single SolrCloud.

> 3) How many shards / replicas per collection should I use?
> 4) Do I need multiple Solr servers?

Not enough data about index usage to say. Between 1 and 50, not kidding.


- Toke Eskildsen

Reply via email to