Re: Poll: Master-Slave or SolrCloud?

Charlie Hull Fri, 28 Apr 2017 01:53:49 -0700

Like Sematext, we help clients with both ES and Solr. A particulardifference is that ES is easier to start with (lots of sensibledefaults) but then once you have got going (and perchance have thrownmany millions of items at it) you can run into trouble because you don'treally understand what's happening under the hood. Solr is still harderto get started with but as you're going to have to figure it out anywayto get anywhere, you shouldn't end up having to rethink or redoeverything later.


Charlie


On 28/04/2017 04:51, David Lee wrote:

As someone who moved from ES to Solr, I can say that one of the things
that makes ES so much easier to configure is that the majority of things
that need to be set for a specific environment are all in pretty much
one config file. Also, I didn't have to deal with the "magic stuff" that
many people have talked about where SolrCloud is concerned.

One of the problems is also do to documentation and user blogs that
discuss how to use SolrCloud. They all tell you how to create a config
to run SolrCloud on one system using the -e cloud flag, but then that's
it. They all seem to avoid discussions of what to do from there in terms
of best practices in distributing to other nodes. It's out there, but in
many cases the guides refer to older versions of Solr so sometimes it is
hard to know what versions people are writing about until you try their
solutions and nothing works, so you finally figure out they are talking
about a much older version.

I moved away from ES to Solr because I prefer the openness of Solr and
the community participation but I really haven't been very successful in
deploying this in a production environment at this point.

I'd say the two things I find that I'm battling with the most are the
cloud configuration and the work I'm having to do to get even the most
basic JSON documents indexed correctly (specifically where I need block
joins, etc.).

I'm hopeful that the V2 Api will help with the JSON issue, but it would
be nice to have some documentation that goes more in-depth on how to set
up additional nodes. Also, even though I use ZK for other parts of my
application, I have no problem with a version running specifically for
Solr if it makes this process more straight-forward.

David



On 4/27/2017 2:51 AM, Emir Arnautovic wrote:

I think creating poll for ES ppl with question: "How do you run master
nodes? A) on some data nodes B) dedicated node C) dedicated server"
would give some insight how big issue is having ZK and if hiding ZK
behind Solr would do any good.

Emir


On 25.04.2017 23:13, Otis Gospodnetić wrote:

Hi Erick,

Could one run *only* embedded ZK on some SolrCloud nodes, sans any data?
It would be equivalent of dedicated Elasticsearch nodes, which is the
current ES best practice/recommendation.  I've never heard of anyone
being
scared of running 3 dedicated master ES nodes, so if SolrCloud
offered the
same, perhaps even completely hiding ZK from users, that would
present the
same level of complexity (err, simplicity) ES users love about ES.
Don't
want to talk about SolrCloud vs. ES here at all, just trying to share
observations since we work a lot with both Elasticsearch and
Solr(Cloud) at
Sematext.

Otis
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/


On Tue, Apr 25, 2017 at 4:03 PM, Erick Erickson
<[email protected]>
wrote:

bq: I read somewhere that you should run your own ZK externally, and
turn off SolrCloud

this is a bit confused. "turn off SolrCloud" has nothing to do with
running ZK internally or externally. SolrCloud requires ZK, whether
internal or external is irrelevant to the term SolrCloud.

On to running an external ZK ensemble. Mostly, that's administratively
by far the safest. If you're running the embedded ZK, then the ZK
instances are tied to your Solr instance. Now if, for any reason, your
Solr nodes hosting ZK go down, you lose ZK quorum, can't index.
etc....

Now consider a cluster with, say, 100 Solr nodes. Not talking replicas
in a collection here, I'm talking 100 physical machines. BTW, this is
not even close to the largest ones I'm aware of. Which three (for
example) are running ZK? If I want to upgrade Solr I better make
really sure not to upgrade to of the Solr instances running ZK at once
if I want my cluster to keep going....

And, ZK is sensitive to system resources. So putting ZK on a Solr node
then hosing, say, updates to my Solr cluster can cause ZK to be
starved for resources.

This is one of those deals where _functionally_, it's OK to run
embedded ZK, but administratively it's suspect.

Best,
Erick

On Tue, Apr 25, 2017 at 10:49 AM, Rick Leir <[email protected]> wrote:

All,
I read somewhere that you should run your own ZK externally, and turn

off SolrCloud. Comments please!

Rick

On April 25, 2017 1:33:31 PM EDT, "Otis Gospodnetić" <

[email protected]> wrote:

This is interesting - that ZK is seen as adding so much complexity
that
it
turns people off!

If you think about it, Elasticsearch users have no choice -- except
their
"ZK" is built-in, hidden, so one doesn't have to think about it, at
least
not initially.

I think I saw mentions (maybe on user or dev MLs or JIRA) about
potentially, in the future, there only being SolrCloud mode (and
dropping
SolrCloud name in favour of Solr).  If the above comment from Charlie
about
complexity is really true for Solr users, and if that's the reason
why
we
see so few people running SolrCloud today, perhaps that's a good
signal
for
Solr development/priorities in terms of ZK
hiding/automating/embedding/something...

Otis
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training -
http://sematext.com/


On Tue, Apr 25, 2017 at 4:50 AM, Charlie Hull <[email protected]>
wrote:

On 24/04/2017 15:58, Otis Gospodnetić wrote:

Hi,

I'm really really surprised here.  Back in 2013 we did a poll to
see

how

people were running Master-Slave (4.x back then) and SolrCloud
was a

bit

more popular than Master-Slave:
https://sematext.com/blog/2013/02/25/poll-solr-cloud-or-not/

Here is a fresh new poll with pretty much the same question -
How do

you

run your Solr?

<https://twitter.com/sematext/status/854927627748036608> -

and guess what?  SolrCloud is *not* at all a lot more prevalent
than
Master-Slave.

We definitely see a lot more SolrCloud used by Sematext Solr
consulting/support customers, so I'm a bit surprised by the results

of

this
poll so far.

I'm not particularly surprised. We regularly see clients either with
single nodes or elderly versions of Solr (or even Lucene). Zookeeper

is

still seen as a bit of a black art. Once you move from 'how do I run

search engine' to 'how do I manage a cluster of servers with scaling

for

performance/resilience/failover' you're looking at a completely new

set

of skills and challenges, which I think puts many people off.

Charlie

Is anyone else surprised by this?  See
https://twitter.com/sematext/
status/854927627748036608

Thanks,
Otis
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training -

http://sematext.com/


---
This email has been checked for viruses by AVG.
http://www.avg.com

--
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk

--
Sorry for being brief. Alternate email is rickleir at yahoo dot com



---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus



--
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk

Re: Poll: Master-Slave or SolrCloud?

Reply via email to