In one of our production environments we use 32GB, 4-core, 3T RAID0
spinning disk Dell servers (do not remember the exact model). We have
about 25 collections with 2 replica (shard-instances) per collection on
each machine - 25 machines. Total of 25 coll * 2 replica/coll/machine *
25 machines = 1250 replica. Each replica contains about 800 million
pretty small documents - thats about 1000 billion (do not know the
english word for it) documents all in all. We index about 1.5 billion
new documents every day (mainly into one of the collections = 50 replica
across 25 machine) and keep a history of 2 years on the data. Shifting
the "index into" collection every month. We can fairly easy keep up with
the indexing load. We have almost non of the data on the heap, but of
course a small fraction of the data in the files will at any time be in
OS file-cache.
Compared to our indexing frequency we do not do a lot of searches. We
have about 10 users searching the system from time to time - anything
from major extracts to small quick searches. Depending on the nature of
the search we have response-times between 1 sec and 5 min. But of course
that is very dependent on "clever" choice on each field wrt index,
store, doc-value etc.
BUT we are not using out-of-box Apache Solr. We have made quit a lot of
performance tweaks ourselves.
Please note that, even though you disable all Solr caches, each replica
will use heap-memory linearly dependent on the number of documents (and
their size) in that replica. But not much, so you can get pretty far
with relatively little RAM.
Our version of Solr is based on Apache Solr 4.4.0, but I expect/hope it
did not get worse in newer releases.
Just to give you some idea of what can at least be achieved - in the
high-end of #replica and #docs, I guess
Regards, Per Steffensen
On 24/03/15 14:02, Ian Rose wrote:
Hi all -
I'm sure this topic has been covered before but I was unable to find any
clear references online or in the mailing list.
Are there any rules of thumb for how many cores (aka shards, since I am
using SolrCloud) is "too many" for one machine? I realize there is no one
answer (depends on size of the machine, etc.) so I'm just looking for a
rough idea. Something like the following would be very useful:
* People commonly run up to X cores/shards on a mid-sized (4 or 8 core)
server without any problems.
* I have never heard of anyone successfully running X cores/shards on a
single machine, even if you throw a lot of hardware at it.
Thanks!
- Ian