Hi Jack, Just to add, OS Disk Cache will still make query performant even though entire index can't be loaded into memory. How much more latency compare to if index gets completely loaded into memory may vary depending to index size etc. I am trying to clarify this here because lot of folks takes this as a hard guideline (to fit index into memory) and try to come up with hardware/machines (100's of machines) just for the sake of fitting index into memory even though there may not be much load/qps on the cluster. For e.g. this may vary and needs to be tested on case by case basis but a machine with 64GB should still provide good performance (not the best) for 100G index on that machine. Do you agree / any thoughts?
Same i believe is the case with Replicas, as on a single machine you have replicas which itself may not fit into memory as well along with shard index. Thanks, Susheel On Tue, Dec 8, 2015 at 11:31 AM, Jack Krupansky <jack.krupan...@gmail.com> wrote: > Generally, you will be resource limited (memory, cpu) rather than by some > arbitrary numeric limit (like 2 billion.) > > My personal general recommendation is for a practical limit is 100 million > documents on a machine/node. Depending on your data model and actual data > that number could be higher or lower. A proof of concept test will allow > you to determine the actual number for your particular use case, but a > presumed limit of 100 million is not a bad start. > > You should have enough memory to hold the entire index in system memory. If > not, your query latency will suffer due to I/O required to constantly > re-read portions of the index into memory. > > The practical limit for documents is not per core or number of cores but > across all cores on the node since it is mostly a memory limit and the > available CPU resources for accessing that memory. > > -- Jack Krupansky > > On Tue, Dec 8, 2015 at 8:57 AM, Toke Eskildsen <t...@statsbiblioteket.dk> > wrote: > > > On Tue, 2015-12-08 at 05:18 -0700, Mugeesh Husain wrote: > > > Capacity regarding 2 simple question: > > > > > > 1.) How many document we could store in single core(capacity of core > > > storage) > > > > There is hard limit of 2 billion documents. > > > > > 2.) How many core we could create in a single server(single node > cluster) > > > > There is no hard limit. Except for 2 billion cores, I guess. But at this > > point in time that is a ridiculously high number of cores. > > > > It is hard to give a suggestion for real-world limits as indexes vary a > > lot and the rules of thumb tend to be quite poor when scaling up. > > > > > http://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/ > > > > People generally seems to run into problems with more than 1000 > > not-too-large cores. If the cores are large, there will probably be > > performance problems long before that. > > > > You will have to build a prototype and test. > > > > - Toke Eskildsen, State and University Library, Denmark > > > > > > >