Thank you all for the responses; very helpful! In this configuration,
by far, most of the time is spent indexing new data rather than
searching. When I look at a solr instance, it is almost always using
100% CPU when no searches are being executed. Given that the machines
have many cores, goi
Joe Obernberger wrote:
[3 billion docs / 16TB / 27 shards on HDFS times 3 for replication]
> Each shard is then hosting about 610GBytes of index. The HDFS cache
> size is very low at about 8GBytes. Suffice it to say, performance isn't
> very good, but again, this is for experimentation.
We ar
In any case, this is really "the sizing question" and generic answers
are not reliable. Here's a long blog about why, but the net-net is
"prototype and measure". Fortunately you can prototype with just a few
nodes (I usually want at least 2 shards) and extrapolate reasonably
well.
https://lucidwor
As per Scott@FullStory you shall see benefits with many smaller shards then
few bigger. Also upgrading to Solr 6.2 would be better as there are many
improvements done handling multiple shards. See below presentation
http://www.slideshare.net/lucidworks/large-scale-solr-at-fullstory-presented-by-sc