I've tried (very simplistically) hitting a collection with a good variety of searches and looking at the collection's heap memory and working out the bytes / doc. I've seen results around 100 bytes / doc, and as low as 3 bytes / doc for collections with small docs. It's still a work-in-progress - not sure if it will scale with docs - or is too simplistic.
On 25 March 2015 at 17:49, Shai Erera <ser...@gmail.com> wrote: > While it's hard to answer this question because as others have said, "it > depends", I think it will be good of we can quantify or assess the cost of > running a SolrCore. > > For instance, let's say that a server can handle a load of 10M indexed > documents (I omit search load on purpose for now) in a single SolrCore. > Would the same server be able to handle the same number of documents, If we > indexed 1000 docs per SolrCore, in total of 10,000 SorClores? If the answer > is no, then it means there is some cost that comes w/ each SolrCore, and we > may at least be able to give an upper bound --- on a server with X amount > of storage, Y GB RAM and Z cores you can run up to maxSolrCores(X, Y, Z). > > Another way to look at it, if I were to create empty SolrCores, would I be > able to create an infinite number of cores if storage was infinite? Or even > empty cores have their toll on CPU and RAM? > > I know from the Lucene side of things that each SolrCore (carries a Lucene > index) there is a toll to an index -- the lexicon, IW's RAM buffer, Codecs > that store things in memory etc. For instance, one downside of splitting a > 10M core into 10,000 cores is that the cost of the holding the total > lexicon (dictionary of indexed words) goes up drastically, since now every > word (just the byte[] of the word) is potentially represented in memory > 10,000 times. > > What other RAM/CPU/Storage costs does a SolrCore carry with it? There are > the caches of course, which really depend on how many documents are > indexed. Any other non-trivial or constant cost? > > So yes, there isn't a single answer to this question. It's just like > someone would ask how many documents can a single Lucene index handle > efficiently. But if we can come up with basic numbers as I outlined above, > it might help people doing rough estimates. That doesn't mean people > shouldn't benchmark, as that upper bound may be waaaay too high for their > data set, query workload and search needs. > > Shai > > On Wed, Mar 25, 2015 at 5:25 AM, Damien Kamerman <dami...@gmail.com> > wrote: > > > From my experience on a high-end sever (256GB memory, 40 core CPU) > testing > > collection numbers with one shard and two replicas, the maximum that > would > > work is 3,000 cores (1,500 collections). I'd recommend much less (perhaps > > half of that), depending on your startup-time requirements. (Though I > have > > settled on 6,000 collection maximum with some patching. See SOLR-7191). > You > > could create multiple clouds after that, and choose the cloud least used > to > > create your collection. > > > > Regarding memory usage I'd pencil in 6MB overheard (no docs) java heap > per > > collection. > > > > On 25 March 2015 at 13:46, Ian Rose <ianr...@fullstory.com> wrote: > > > > > First off thanks everyone for the very useful replies thus far. > > > > > > Shawn - thanks for the list of items to check. #1 and #2 should be > fine > > > for us and I'll check our ulimit for #3. > > > > > > To add a bit of clarification, we are indeed using SolrCloud. Our > > current > > > setup is to create a new collection for each customer. For now we > allow > > > SolrCloud to decide for itself where to locate the initial shard(s) but > > in > > > time we expect to refine this such that our system will automatically > > > choose the least loaded nodes according to some metric(s). > > > > > > Having more than one business entity controlling the configuration of a > > > > single (Solr) server is a recipe for disaster. Solr works well if > there > > > is > > > > an architect for the system. > > > > > > > > > Jack, can you explain a bit what you mean here? It looks like Toke > > caught > > > your meaning but I'm afraid it missed me. What do you mean by > "business > > > entity"? Is your concern that with automatic creation of collections > > they > > > will be distributed willy-nilly across the cluster, leading to uneven > > load > > > across nodes? If it is relevant, the schema and solrconfig are > > controlled > > > entirely by me and is the same for all collections. Thus theoretically > > we > > > could actually just use one single collection for all of our customers > > > (adding a 'customer:<whatever>' type fq to all queries) but since we > > never > > > need to query across customers it seemed more performant (as well as > > safer > > > - less chance of accidentally leaking data across customers) to use > > > separate collections. > > > > > > Better to give each tenant a separate Solr instance that you spin up > and > > > > spin down based on demand. > > > > > > > > > Regarding this, if by tenant you mean "customer", this is not viable > for > > us > > > from a cost perspective. As I mentioned initially, many of our > customers > > > are very small so dedicating an entire machine to each of them would > not > > be > > > economical (or efficient). Or perhaps I am not understanding what your > > > definition of "tenant" is? > > > > > > Cheers, > > > Ian > > > > > > > > > > > > On Tue, Mar 24, 2015 at 4:51 PM, Toke Eskildsen < > t...@statsbiblioteket.dk> > > > wrote: > > > > > > > Jack Krupansky [jack.krupan...@gmail.com] wrote: > > > > > I'm sure that I am quite unqualified to describe his hypothetical > > > setup. > > > > I > > > > > mean, he's the one using the term multi-tenancy, so it's for him to > > be > > > > > clear. > > > > > > > > It was my understanding that Ian used them interchangeably, but of > > course > > > > Ian it the only one that knows. > > > > > > > > > For me, it's a question of who has control over the config and > schema > > > and > > > > > collection creation. Having more than one business entity > controlling > > > the > > > > > configuration of a single (Solr) server is a recipe for disaster. > > > > > > > > Thank you. Now your post makes a lot more sense. I will not argue > > against > > > > that. > > > > > > > > - Toke Eskildsen > > > > > > > > > > > > > > > -- > > Damien Kamerman > > > -- Damien Kamerman