Yes, there are nuances to any general rule. It's just a starting point, and
your own testing will confirm specific details for your specific app and
data. For example, maybe you don't query all fields commonly, so each
field-specific index may not require memory or not require it so commonly.
And, yes, each app has its own latency requirements. The purpose of a
general rule is to generally avoid unhappiness, but if you have an appetite
and tolerance for unhappiness, then go for it.

Replica vs. shard? They're basically the same - a replica is a copy of a
shard.

-- Jack Krupansky

On Wed, Dec 9, 2015 at 10:36 AM, Susheel Kumar <susheel2...@gmail.com>
wrote:

> Hi Jack,
>
> Just to add, OS Disk Cache will still make query performant even though
> entire index can't be loaded into memory. How much more latency compare to
> if index gets completely loaded into memory may vary depending to index
> size etc.  I am trying to clarify this here because lot of folks takes this
> as a hard guideline (to fit index into memory)  and try to come up with
> hardware/machines (100's of machines) just for the sake of fitting index
> into memory even though there may not be much load/qps on the cluster.  For
> e.g. this may vary and needs to be tested on case by case basis but a
> machine with 64GB  should still provide good performance (not the best) for
> 100G index on that machine.  Do you agree / any thoughts?
>
> Same i believe is the case with Replicas,   as on a single machine you have
> replicas which itself may not fit into memory as well along with shard
> index.
>
> Thanks,
> Susheel
>
> On Tue, Dec 8, 2015 at 11:31 AM, Jack Krupansky <jack.krupan...@gmail.com>
> wrote:
>
> > Generally, you will be resource limited (memory, cpu) rather than by some
> > arbitrary numeric limit (like 2 billion.)
> >
> > My personal general recommendation is for a practical limit is 100
> million
> > documents on a machine/node. Depending on your data model and actual data
> > that number could be higher or lower. A proof of concept test will allow
> > you to determine the actual number for your particular use case, but a
> > presumed limit of 100 million is not a bad start.
> >
> > You should have enough memory to hold the entire index in system memory.
> If
> > not, your query latency will suffer due to I/O required to constantly
> > re-read portions of the index into memory.
> >
> > The practical limit for documents is not per core or number of cores but
> > across all cores on the node since it is mostly a memory limit and the
> > available CPU resources for accessing that memory.
> >
> > -- Jack Krupansky
> >
> > On Tue, Dec 8, 2015 at 8:57 AM, Toke Eskildsen <t...@statsbiblioteket.dk>
> > wrote:
> >
> > > On Tue, 2015-12-08 at 05:18 -0700, Mugeesh Husain wrote:
> > > > Capacity regarding 2 simple question:
> > > >
> > > > 1.) How many document we could store in single core(capacity of core
> > > > storage)
> > >
> > > There is hard limit of 2 billion documents.
> > >
> > > > 2.) How many core we could create in a single server(single node
> > cluster)
> > >
> > > There is no hard limit. Except for 2 billion cores, I guess. But at
> this
> > > point in time that is a ridiculously high number of cores.
> > >
> > > It is hard to give a suggestion for real-world limits as indexes vary a
> > > lot and the rules of thumb tend to be quite poor when scaling up.
> > >
> > >
> >
> http://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
> > >
> > > People generally seems to run into problems with more than 1000
> > > not-too-large cores. If the cores are large, there will probably be
> > > performance problems long before that.
> > >
> > > You will have to build a prototype and test.
> > >
> > > - Toke Eskildsen, State and University Library, Denmark
> > >
> > >
> > >
> >
>

Reply via email to