Thanks, Jack for quick reply. With Replica / Shard I mean to say on a given machine there may be two/more replicas and all of them may not fit into memory.
On Wed, Dec 9, 2015 at 11:00 AM, Jack Krupansky <jack.krupan...@gmail.com> wrote: > Yes, there are nuances to any general rule. It's just a starting point, and > your own testing will confirm specific details for your specific app and > data. For example, maybe you don't query all fields commonly, so each > field-specific index may not require memory or not require it so commonly. > And, yes, each app has its own latency requirements. The purpose of a > general rule is to generally avoid unhappiness, but if you have an appetite > and tolerance for unhappiness, then go for it. > > Replica vs. shard? They're basically the same - a replica is a copy of a > shard. > > -- Jack Krupansky > > On Wed, Dec 9, 2015 at 10:36 AM, Susheel Kumar <susheel2...@gmail.com> > wrote: > > > Hi Jack, > > > > Just to add, OS Disk Cache will still make query performant even though > > entire index can't be loaded into memory. How much more latency compare > to > > if index gets completely loaded into memory may vary depending to index > > size etc. I am trying to clarify this here because lot of folks takes > this > > as a hard guideline (to fit index into memory) and try to come up with > > hardware/machines (100's of machines) just for the sake of fitting index > > into memory even though there may not be much load/qps on the cluster. > For > > e.g. this may vary and needs to be tested on case by case basis but a > > machine with 64GB should still provide good performance (not the best) > for > > 100G index on that machine. Do you agree / any thoughts? > > > > Same i believe is the case with Replicas, as on a single machine you > have > > replicas which itself may not fit into memory as well along with shard > > index. > > > > Thanks, > > Susheel > > > > On Tue, Dec 8, 2015 at 11:31 AM, Jack Krupansky < > jack.krupan...@gmail.com> > > wrote: > > > > > Generally, you will be resource limited (memory, cpu) rather than by > some > > > arbitrary numeric limit (like 2 billion.) > > > > > > My personal general recommendation is for a practical limit is 100 > > million > > > documents on a machine/node. Depending on your data model and actual > data > > > that number could be higher or lower. A proof of concept test will > allow > > > you to determine the actual number for your particular use case, but a > > > presumed limit of 100 million is not a bad start. > > > > > > You should have enough memory to hold the entire index in system > memory. > > If > > > not, your query latency will suffer due to I/O required to constantly > > > re-read portions of the index into memory. > > > > > > The practical limit for documents is not per core or number of cores > but > > > across all cores on the node since it is mostly a memory limit and the > > > available CPU resources for accessing that memory. > > > > > > -- Jack Krupansky > > > > > > On Tue, Dec 8, 2015 at 8:57 AM, Toke Eskildsen <t...@statsbiblioteket.dk > > > > > wrote: > > > > > > > On Tue, 2015-12-08 at 05:18 -0700, Mugeesh Husain wrote: > > > > > Capacity regarding 2 simple question: > > > > > > > > > > 1.) How many document we could store in single core(capacity of > core > > > > > storage) > > > > > > > > There is hard limit of 2 billion documents. > > > > > > > > > 2.) How many core we could create in a single server(single node > > > cluster) > > > > > > > > There is no hard limit. Except for 2 billion cores, I guess. But at > > this > > > > point in time that is a ridiculously high number of cores. > > > > > > > > It is hard to give a suggestion for real-world limits as indexes > vary a > > > > lot and the rules of thumb tend to be quite poor when scaling up. > > > > > > > > > > > > > > http://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/ > > > > > > > > People generally seems to run into problems with more than 1000 > > > > not-too-large cores. If the cores are large, there will probably be > > > > performance problems long before that. > > > > > > > > You will have to build a prototype and test. > > > > > > > > - Toke Eskildsen, State and University Library, Denmark > > > > > > > > > > > > > > > > > >