Thanks, Jack for quick reply.  With Replica / Shard I mean to say on a
given machine there may be two/more replicas and all of them may not fit
into memory.

On Wed, Dec 9, 2015 at 11:00 AM, Jack Krupansky <jack.krupan...@gmail.com>
wrote:

> Yes, there are nuances to any general rule. It's just a starting point, and
> your own testing will confirm specific details for your specific app and
> data. For example, maybe you don't query all fields commonly, so each
> field-specific index may not require memory or not require it so commonly.
> And, yes, each app has its own latency requirements. The purpose of a
> general rule is to generally avoid unhappiness, but if you have an appetite
> and tolerance for unhappiness, then go for it.
>
> Replica vs. shard? They're basically the same - a replica is a copy of a
> shard.
>
> -- Jack Krupansky
>
> On Wed, Dec 9, 2015 at 10:36 AM, Susheel Kumar <susheel2...@gmail.com>
> wrote:
>
> > Hi Jack,
> >
> > Just to add, OS Disk Cache will still make query performant even though
> > entire index can't be loaded into memory. How much more latency compare
> to
> > if index gets completely loaded into memory may vary depending to index
> > size etc.  I am trying to clarify this here because lot of folks takes
> this
> > as a hard guideline (to fit index into memory)  and try to come up with
> > hardware/machines (100's of machines) just for the sake of fitting index
> > into memory even though there may not be much load/qps on the cluster.
> For
> > e.g. this may vary and needs to be tested on case by case basis but a
> > machine with 64GB  should still provide good performance (not the best)
> for
> > 100G index on that machine.  Do you agree / any thoughts?
> >
> > Same i believe is the case with Replicas,   as on a single machine you
> have
> > replicas which itself may not fit into memory as well along with shard
> > index.
> >
> > Thanks,
> > Susheel
> >
> > On Tue, Dec 8, 2015 at 11:31 AM, Jack Krupansky <
> jack.krupan...@gmail.com>
> > wrote:
> >
> > > Generally, you will be resource limited (memory, cpu) rather than by
> some
> > > arbitrary numeric limit (like 2 billion.)
> > >
> > > My personal general recommendation is for a practical limit is 100
> > million
> > > documents on a machine/node. Depending on your data model and actual
> data
> > > that number could be higher or lower. A proof of concept test will
> allow
> > > you to determine the actual number for your particular use case, but a
> > > presumed limit of 100 million is not a bad start.
> > >
> > > You should have enough memory to hold the entire index in system
> memory.
> > If
> > > not, your query latency will suffer due to I/O required to constantly
> > > re-read portions of the index into memory.
> > >
> > > The practical limit for documents is not per core or number of cores
> but
> > > across all cores on the node since it is mostly a memory limit and the
> > > available CPU resources for accessing that memory.
> > >
> > > -- Jack Krupansky
> > >
> > > On Tue, Dec 8, 2015 at 8:57 AM, Toke Eskildsen <t...@statsbiblioteket.dk
> >
> > > wrote:
> > >
> > > > On Tue, 2015-12-08 at 05:18 -0700, Mugeesh Husain wrote:
> > > > > Capacity regarding 2 simple question:
> > > > >
> > > > > 1.) How many document we could store in single core(capacity of
> core
> > > > > storage)
> > > >
> > > > There is hard limit of 2 billion documents.
> > > >
> > > > > 2.) How many core we could create in a single server(single node
> > > cluster)
> > > >
> > > > There is no hard limit. Except for 2 billion cores, I guess. But at
> > this
> > > > point in time that is a ridiculously high number of cores.
> > > >
> > > > It is hard to give a suggestion for real-world limits as indexes
> vary a
> > > > lot and the rules of thumb tend to be quite poor when scaling up.
> > > >
> > > >
> > >
> >
> http://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
> > > >
> > > > People generally seems to run into problems with more than 1000
> > > > not-too-large cores. If the cores are large, there will probably be
> > > > performance problems long before that.
> > > >
> > > > You will have to build a prototype and test.
> > > >
> > > > - Toke Eskildsen, State and University Library, Denmark
> > > >
> > > >
> > > >
> > >
> >
>

Reply via email to