Re: cores vs. instances vs. zookeeper vs. cloud vs ?

Erick Erickson Fri, 16 Dec 2016 11:02:25 -0800

bq: if i start using SolrCloud i could have my current multi-core setup
   (e.g. "transactions", "opportunities", etc.) exist within the appropriate
   collection.

I'd guess that cores == collections but....
Are you reaching across from one core to another to
satisfy your use-case? I.e. using "cross core joins"
or anything similar? You have to do some careful
placement of collections and co-locate the replicas
for each collection. This is quite do-able via the collecitons
API.

bq: this seems to be the same with ZK, too?

Kind of ignore ZK. From what you're describing you won't
have 100s of nodes. Therefore ZK will just keep track of it
all for you. One common misconception is that ZK is involved
in indexing and querying. It's not, kind of. Once each Solr
instance gets the current state of the network, the Solr
instances don't need to reference ZK to update or query.
It's only when nodes change state (are shut down and the
like) that ZK gets involved in letting the Solr nodes know about
the state change.

bq: is the separate indexing
   happening out of the box w Cloud or something it's even capable of?

Totally. The unit of address is a "collection". So let's say you have
a collection named transactions_dev. You send updates to
http://any_solr_server:port/solr/transactions_dev/update.
Queries similarly as
http://any_solr_server:port/solr/transactions_dev/query

Don't try to put cores in here, just think about collections. The
actual _core_ is something like transactions_dev_shard1_replica1
but you'll almost never reference it directly unless you're debugging or
something.

So what I'd recommend is just go through the getting started example
and at some point it'll all suddenly start to make sense ;).

The take-away is that much of what you're doing in terms of keeping
track of cores and all that just goes away.

Best,
Erick

On Fri, Dec 16, 2016 at 9:30 AM, John Blythe <j...@curvolabs.com> wrote:
> thanks, erick. this is helpful. a few questions for clarity's sake, but
> first: nope, not using SolrCloud as of yet.
>
>    - if i start using SolrCloud i could have my current multi-core setup
>    (e.g. "transactions", "opportunities", etc.) exist within the appropriate
>    collection. so instead of dev-transactions i'd have a 'dev' collection that
>    has a 'transactions' core inside of it?
>    - this seems to be the same with ZK, too?
>    - i'm totally fine w separate/diff indexing. the demo collection, for
>    instance, *has* to be separate from production bc the data has been
>    stitched together from various customers' accounts on prod and blinded so
>    that we have avoid privacy issues and can have all the various goodies
>    under one demo account rather than separate ones. is the separate indexing
>    happening out of the box w Cloud or something it's even capable of?
>
> thanks again, erick!
>
> --
> *John Blythe*
> Product Manager & Lead Developer
>
> 251.605.3071 | j...@curvolabs.com
> www.curvolabs.com
>
> 58 Adams Ave
> Evansville, IN 47713
>
> On Fri, Dec 16, 2016 at 11:38 AM, Erick Erickson <erickerick...@gmail.com>
> wrote:
>
>> It's not quite clear to me whether you're using SolrCloud now or not, my
>> guess is not. My guess here is that you _should_ move to SolrCloud and
>> collections. Then, instead of thinking about "cores", you just think about
>> collections. Where the replicas live then isn't something you have to
>> manage
>> in that case.
>>
>> There's a bit of a learning curve for Zookeeper, and a mental shift you
>> have to make to not worry about cores at all, just trust Solr. That said,
>> if you _want_ to explicitly manage where each and every core for each
>> and every collection lives, that's easy with the collections API. Once you
>> do make that shift, going back is painful ;)
>>
>> So the scenario is that you have three collections, prod, dev, demo. They
>> all
>> happen to use the same configset (which you keep in ZK). You have one
>> zookeeper ensemble that the three collections reference. They can even
>> all share the same machine if that machine has sufficient capacity.
>>
>> The deal here is that these are really completely independent; you'll have
>> to index your content to each separately.
>>
>> But then your URL becomes x.x.x.x:8983/solr/prod, x.x.x.x:8983/solr/dev
>> and the like.
>>
>> FWIW,
>> Erick
>>
>> On Fri, Dec 16, 2016 at 5:26 AM, John Blythe <j...@curvolabs.com> wrote:
>> > good morning everyone.
>> >
>> > i've got a crowing number of cores that various parts of our application
>> > are relying upon. i'm having difficulty figuring out the best way to
>> > continue expanding for both sake of scale and convenience.
>> >
>> > i need two extra versions of each core due to our demo instance and our
>> > development instance. when we had just one or two cores it wasn't the
>> worst
>> > thing to have cores like X, demo-X, and dev-X. that has quickly become
>> > unnecessarily cumbersome.
>> >
>> > i've considered moving each instance to its own solr instance, perhaps
>> just
>> > throwing it on a different port. for example, production could be
>> > x.x.x.x:8983, dev x.x.x.x:8993, and demo x.x.x.x:8938.
>> >
>> > i'm pretty helpless at this point with zookeeper and/or solrcloud. given
>> > the above info, i'd love to hear some quick overview ideas as to the best
>> > approach that i can then begin to explore online.
>> >
>> > thanks for any pointers!
>>

Re: cores vs. instances vs. zookeeper vs. cloud vs ?

Reply via email to