Bram: That works. I try to monitor the number of 0-hit queries when I generate a test set on the theory that those are _usually_ groups of random terms I've selected that aren't a good model. So it's often a sequence like "generate my list, see which ones give 0 results and remove them". Rinse, repeat.
Like you said, imperfect but _loads_ better than trying to create them without real user queries as guidance... Best, Erick On Sat, Apr 30, 2016 at 4:19 AM, Bram Van Dam <bram.van...@intix.eu> wrote: >> If I'm reading this right, you have 420M docs on a single shard? >> Yep, you were reading it right. > > Is Erick mentioned, it's hard to give concrete sizing advice, but we've > found 120M to be the magic number. When a shard contains more than 120M > documents, performance goes down rapidly & GC pauses grow a lot longer. > Up until 250M things remain acceptable. But then performance starts to > drop very quickly after that. > > - Bram >