Bram:

That works. I try to monitor the number of 0-hit
queries when I generate a test set on the theory that
those are _usually_ groups of random terms I've
selected that aren't a good model. So it's often
a sequence like "generate my list, see which
ones give 0 results and remove them". Rinse,
repeat.

Like you said, imperfect but _loads_ better than
trying to create them without real user queries
as guidance...

Best,
Erick

On Sat, Apr 30, 2016 at 4:19 AM, Bram Van Dam <bram.van...@intix.eu> wrote:
>> If I'm reading this right, you have 420M docs on a single shard?
>> Yep, you were reading it right.
>
> Is Erick mentioned, it's hard to give concrete sizing advice, but we've
> found 120M to be the magic number. When a shard contains more than 120M
> documents, performance goes down rapidly & GC pauses grow a lot longer.
> Up until 250M things remain acceptable. But then performance starts to
> drop very quickly after that.
>
>  - Bram
>

Reply via email to