Re: Large index recommendation

2017-01-16 Thread Joe Obernberger
Thank you all for the responses; very helpful! In this configuration, by far, most of the time is spent indexing new data rather than searching. When I look at a solr instance, it is almost always using 100% CPU when no searches are being executed. Given that the machines have many cores, goi

Re: Large index recommendation

2017-01-13 Thread Toke Eskildsen
Joe Obernberger wrote: [3 billion docs / 16TB / 27 shards on HDFS times 3 for replication] > Each shard is then hosting about 610GBytes of index. The HDFS cache > size is very low at about 8GBytes. Suffice it to say, performance isn't > very good, but again, this is for experimentation. We ar

Re: Large index recommendation

2017-01-13 Thread Erick Erickson
In any case, this is really "the sizing question" and generic answers are not reliable. Here's a long blog about why, but the net-net is "prototype and measure". Fortunately you can prototype with just a few nodes (I usually want at least 2 shards) and extrapolate reasonably well. https://lucidwor

Re: Large index recommendation

2017-01-13 Thread Susheel Kumar
As per Scott@FullStory you shall see benefits with many smaller shards then few bigger. Also upgrading to Solr 6.2 would be better as there are many improvements done handling multiple shards. See below presentation http://www.slideshare.net/lucidworks/large-scale-solr-at-fullstory-presented-by-sc