Re: LotsOfCores feature

Jack Krupansky Thu, 06 Jun 2013 17:33:11 -0700

I'm glad Erick finally answered my question (I think I actually asked it onthe original Jira) concerning the rough magnitude of "Lots" - it'shundreds/thousands, but not hundreds of thousands, millions, or tens ofmillions.

So, if an app needs "millions", I think that suggests a "MegaCores"capability distinct from "LotsOfCores".

A use case would a web site or service that had millions of users, each ofwhom would have an active Solr core when they are active, but inactiveotherwise. Of course those cores would not all reside on one node andZooKeeper is out of the question for managing anything that is in themillions. This would be a true "cloud" or "data center" and even multi-datacenter app, not a "cluster" app.

So, I imagine that the app's "cloud" would have ZooKeeper-like servers whosejob is to know all the available servers in the cloud and what Solr coresare running on them and how much spare capacity they have. If a requestcomes in to "find" a user's Solr, the CloudKeeper would consult its database(probably a Solr core with "millions" of rows"!) for the current locationand status of the user's core. If the core is active, great, its location isreturned. If not active, CK would check to see if the node on which itresides has sufficient spare compute capacity. If so, the user's Solr corewould be spun up. If not, CK would find a machine with plenty of sparecapacity, send a request to that node to pull the inactive core from thebusy machine to the new node (or from a backup store of long idle Solrcores). Once the new node has the user's Solr core up, the node notifies CKof its status and CK updates its database. Meanwhile, the original clientrequest would have returned with an "in progress" status and the clientwould periodically ping CK to see if progress had completed.

And then there would probably be an idle timeout that would cause a Solrcore to spin down and notify CK that it is inactive.


Or something like that.

This would be a lot more of a true "Solr Cloud" than the "cluster" supportthat we have today.

And the "CloudKeeper" itself might be a "traditional" SolrCloud cluster,except that it needs to be multi-data center.


-- Jack Krupansky

-----Original Message-----From: Aleksey

Sent: Thursday, June 06, 2013 8:06 PM
To: solr-user
Subject: Re: LotsOfCores feature

I would not try putting tens of millions of cores on one machine. My
question (and I think Jack's as well) was around having them across a
fleet, say if I need 1M then I'd get 100 machines appropriately sized
for 10K each. I was clarifying because there was some talk about
ZooKeeper only being able to store small amount of configuration and
there were concerns that it won't keep information about which core is
where if it's millions.

This question is still open in my mind, since I haven't yet
familiarized myself with how ZK works.

On Thu, Jun 6, 2013 at 3:23 PM, Erick Erickson <erickerick...@gmail.com>wrote:

Now Jack. You know "it depends" <G>.... Just answer
the questions "how many simultaneous cores can you
open on your hardware", and "what's the maximum percentage
of the cores you expect to be open at any one time".
Do some math and you have your answer.....

The meta-data, essentially anything in the <core> tag
or the core.properties file is kept in an in-memory structure. At
startup time, that structure has to be filled. I haven't measured
exactly, but it's relatively small (GUESS: 256 bytes) plus control
structures. So _theoretically_ you could put millions on a single
node. But you don't want to because:
1> if you're doing core discovery, you have to walk millions of
     directories every time you start up.
2> otherwise you're maintaining a huge solr.xml file (which will be
    going away anyway).

Aleksey's use case also calls for "less than a million" or so open
at once. I can't imagine fitting that many cores into memory
simultaneously one one machine.

The design goal is 10-15K cores on a machine. The theory
is that pretty soon you're going to have a big enough percentage
of them open that you'll blow memory up.

And this is always governed by the size of the transient cache.
Pretty soon you'll be opening a core for each and every query if
you have more requests coming in for unique cores than your
cache size.

So, as usual, it's a matter of the usage pattern to determine how
many cores you can put on the machine.

FWIW,
Erick

On Thu, Jun 6, 2013 at 4:13 PM, Jack Krupansky <j...@basetechnology.com>wrote:

So, is that a clear yes or a clear no for Aleksey's use case - 10's of
millions of cores, not all active but each loadable on demand?

I asked this same basic question months ago and there was no answer
forthcoming.

-- Jack Krupansky

-----Original Message----- From: Erick Erickson
Sent: Thursday, June 06, 2013 3:53 PM
To: solr-user@lucene.apache.org
Subject: Re: LotsOfCores feature


100K is really not the limit, it's just hard to imagine
100K cores on a single machine unless some were
really rarely used. And it's per node, not cluster-wide.

The current state is that everything is in place, including
transient cores, auto-discovery, etc. So you should be
able to go ahead and try it out.

The next bit that will help with efficiency is sharing named
config sets. The intent here is that <solrhome>/configs will
contain sub-dirs like "conf1", "conf2" etc. Then your cores
can reference configName=conf1 and only one copy of
the configuration data will be used rather than re-loading one
for each core as it comes up and down.

Do note that the _first_ query in to one of the not-yet-loaded
cores will be slow. The model here is that you can tolerate
some queries taking more time at first than you might like
in exchange for the hardware savings. This pre-supposes that
you simply cannot fit all the cores into memory at once.

The "won't fix" bits are there because, as we got farther into this
process, the approach changed and the functionality of the
won't fix JIRAs was subsumed by other changes by and large.

I've got to update that documentation sometime, but just haven't
had time yet. If you go down this route, we'll be happy to
add your name to the authorized editors of the wiki list if you'd
like.

Best
Erick

On Thu, Jun 6, 2013 at 3:08 PM, Aleksey <bitterc...@gmail.com> wrote:


I was looking at this wiki and linked issues:
http://wiki.apache.org/solr/LotsOfCores

they talk about a limit being 100K cores. Is that per server or per
entire fleet because zookeeper needs to manage that?

I was considering a use case where I have tens of millions of indices
but less that a million needs to be active at any time, so they need
to be loaded on demand and evicted when not used for a while.
Also since number one requirement is efficient loading of course I
assume I will store a prebuilt index somewhere so Solr will just
download it and strap it in, right?

The root issue is marked as "won;t fix" but some other important
subissues are marked as resolved. What's the overall status of the
effort?

Thank you in advance,

Aleksey

Re: LotsOfCores feature

Reply via email to