Re: rough maximum cores (shards) per machine?

Damien Kamerman Wed, 25 Mar 2015 00:10:17 -0700

I've tried (very simplistically) hitting a collection with a good variety
of searches and looking at the collection's heap memory and working out the
bytes / doc. I've seen results around 100 bytes / doc, and as low as 3
bytes / doc for collections with small docs. It's still a work-in-progress
- not sure if it will scale with docs - or is too simplistic.


On 25 March 2015 at 17:49, Shai Erera <ser...@gmail.com> wrote:

> While it's hard to answer this question because as others have said, "it
> depends", I think it will be good of we can quantify or assess the cost of
> running a SolrCore.
>
> For instance, let's say that a server can handle a load of 10M indexed
> documents (I omit search load on purpose for now) in a single SolrCore.
> Would the same server be able to handle the same number of documents, If we
> indexed 1000 docs per SolrCore, in total of 10,000 SorClores? If the answer
> is no, then it means there is some cost that comes w/ each SolrCore, and we
> may at least be able to give an upper bound --- on a server with X amount
> of storage, Y GB RAM and Z cores you can run up to maxSolrCores(X, Y, Z).
>
> Another way to look at it, if I were to create empty SolrCores, would I be
> able to create an infinite number of cores if storage was infinite? Or even
> empty cores have their toll on CPU and RAM?
>
> I know from the Lucene side of things that each SolrCore (carries a Lucene
> index) there is a toll to an index -- the lexicon, IW's RAM buffer, Codecs
> that store things in memory etc. For instance, one downside of splitting a
> 10M core into 10,000 cores is that the cost of the holding the total
> lexicon (dictionary of indexed words) goes up drastically, since now every
> word (just the byte[] of the word) is potentially represented in memory
> 10,000 times.
>
> What other RAM/CPU/Storage costs does a SolrCore carry with it? There are
> the caches of course, which really depend on how many documents are
> indexed. Any other non-trivial or constant cost?
>
> So yes, there isn't a single answer to this question. It's just like
> someone would ask how many documents can a single Lucene index handle
> efficiently. But if we can come up with basic numbers as I outlined above,
> it might help people doing rough estimates. That doesn't mean people
> shouldn't benchmark, as that upper bound may be waaaay too high for their
> data set, query workload and search needs.
>
> Shai
>
> On Wed, Mar 25, 2015 at 5:25 AM, Damien Kamerman <dami...@gmail.com>
> wrote:
>
> > From my experience on a high-end sever (256GB memory, 40 core CPU)
> testing
> > collection numbers with one shard and two replicas, the maximum that
> would
> > work is 3,000 cores (1,500 collections). I'd recommend much less (perhaps
> > half of that), depending on your startup-time requirements. (Though I
> have
> > settled on 6,000 collection maximum with some patching. See SOLR-7191).
> You
> > could create multiple clouds after that, and choose the cloud least used
> to
> > create your collection.
> >
> > Regarding memory usage I'd pencil in 6MB overheard (no docs) java heap
> per
> > collection.
> >
> > On 25 March 2015 at 13:46, Ian Rose <ianr...@fullstory.com> wrote:
> >
> > > First off thanks everyone for the very useful replies thus far.
> > >
> > > Shawn - thanks for the list of items to check.  #1 and #2 should be
> fine
> > > for us and I'll check our ulimit for #3.
> > >
> > > To add a bit of clarification, we are indeed using SolrCloud.  Our
> > current
> > > setup is to create a new collection for each customer.  For now we
> allow
> > > SolrCloud to decide for itself where to locate the initial shard(s) but
> > in
> > > time we expect to refine this such that our system will automatically
> > > choose the least loaded nodes according to some metric(s).
> > >
> > > Having more than one business entity controlling the configuration of a
> > > > single (Solr) server is a recipe for disaster. Solr works well if
> there
> > > is
> > > > an architect for the system.
> > >
> > >
> > > Jack, can you explain a bit what you mean here?  It looks like Toke
> > caught
> > > your meaning but I'm afraid it missed me.  What do you mean by
> "business
> > > entity"?  Is your concern that with automatic creation of collections
> > they
> > > will be distributed willy-nilly across the cluster, leading to uneven
> > load
> > > across nodes?  If it is relevant, the schema and solrconfig are
> > controlled
> > > entirely by me and is the same for all collections.  Thus theoretically
> > we
> > > could actually just use one single collection for all of our customers
> > > (adding a 'customer:<whatever>' type fq to all queries) but since we
> > never
> > > need to query across customers it seemed more performant (as well as
> > safer
> > > - less chance of accidentally leaking data across customers) to use
> > > separate collections.
> > >
> > > Better to give each tenant a separate Solr instance that you spin up
> and
> > > > spin down based on demand.
> > >
> > >
> > > Regarding this, if by tenant you mean "customer", this is not viable
> for
> > us
> > > from a cost perspective.  As I mentioned initially, many of our
> customers
> > > are very small so dedicating an entire machine to each of them would
> not
> > be
> > > economical (or efficient).  Or perhaps I am not understanding what your
> > > definition of "tenant" is?
> > >
> > > Cheers,
> > > Ian
> > >
> > >
> > >
> > > On Tue, Mar 24, 2015 at 4:51 PM, Toke Eskildsen <
> t...@statsbiblioteket.dk>
> > > wrote:
> > >
> > > > Jack Krupansky [jack.krupan...@gmail.com] wrote:
> > > > > I'm sure that I am quite unqualified to describe his hypothetical
> > > setup.
> > > > I
> > > > > mean, he's the one using the term multi-tenancy, so it's for him to
> > be
> > > > > clear.
> > > >
> > > > It was my understanding that Ian used them interchangeably, but of
> > course
> > > > Ian it the only one that knows.
> > > >
> > > > > For me, it's a question of who has control over the config and
> schema
> > > and
> > > > > collection creation. Having more than one business entity
> controlling
> > > the
> > > > > configuration of a single (Solr) server is a recipe for disaster.
> > > >
> > > > Thank you. Now your post makes a lot more sense. I will not argue
> > against
> > > > that.
> > > >
> > > > - Toke Eskildsen
> > > >
> > >
> >
> >
> >
> > --
> > Damien Kamerman
> >
>



-- 
Damien Kamerman

Re: rough maximum cores (shards) per machine?

Reply via email to