I wrote a Python program that:
1. Gets a cluster status.
2. Extracts the Zookeeper location from that.
3. Uploads solr.xml and config to Zookeeper (using kazoo library).
4. Sends an async reload command.
5. Polls for success until all the nodes have finished the reload.
6. Optionally rebuilds the
We use an AWS ALB for all of our Solr clusters. One is 40 instances.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Jun 29, 2018, at 8:33 PM, Sushant Vengurlekar
> wrote:
>
> What are some of the suggested loadbalancers for solrcloud? Can AWS ELB
What are some of the suggested loadbalancers for solrcloud? Can AWS ELB be
used for load balancing?
On Fri, Jun 29, 2018 at 8:04 PM, Erick Erickson
wrote:
> In your setup, the load balancer prevents single points of failure.
>
> Since you're pinging a URL, what happens if that node dies or is tu
Adding to Shawn's comments.
You've pretty much nailed all the possibilities, it depends on
what you're most comfortable with I suppose.
The only thing I'd add is that you probably have dev and prod
environments and work out the correct schemas on dev then
migrate to prod (at least that's what par
Thanks for the detailed explanation Eric. Really helped clear up my
understanding.
On Fri, Jun 29, 2018 at 8:04 PM, Erick Erickson
wrote:
> In your setup, the load balancer prevents single points of failure.
>
> Since you're pinging a URL, what happens if that node dies or is turned
> off?
> You
In your setup, the load balancer prevents single points of failure.
Since you're pinging a URL, what happens if that node dies or is turned off?
Your PHP program has no way of knowing what to do, but the load
balancer does.
Your understanding of Zookeeper's role shows a common misconception.
Zoo
Thanks for your reply. I have a follow up question. Why is a load balancer
needed? Isn't that the job of zookeeper to loadbalance queries across solr
nodes?
I was under the impression that you send query to zookeeper and it handles
the rest and sends the response back. Can you please enlighten .me
You send your queries and updates directly to Solr's collection e.g.
http://host:port/solr/. You can use any Solr node for
this request. If the node does not have the collection being queried then
the request will be forwarded internally to a Solr instance which has that
collection.
ZooKeeper is u
I have a question regarding querying in solrcloud.
I am working on php code to query solrcloud for search results. Do I send
the query to zookeeper or send it to a particular solr node? How does the
querying process work in general.
Thank you
On 6/29/2018 3:26 PM, Zimmermann, Thomas wrote:
> We're transitioning from Solr 4.10 to 7.x and working through our options
> around managing our schemas. Currently we manage our schema files in a git
> repository, make changes to the xml files,
Hopefully you've got the entire config in version
Hi,
We're transitioning from Solr 4.10 to 7.x and working through our options
around managing our schemas. Currently we manage our schema files in a git
repository, make changes to the xml files, and then push them out to our
zookeeper cluster via the zkcli and the upconfig command like:
/apps
The documentation does not say that Solr uses the zk client 3.4.11. It says,
"Solr currently uses Apache ZooKeeper v3.4.11.” That is on the page titled
"Setting Up an External ZooKeeper Ensemble” in the section "Download Apache
ZooKeeper”. Maybe that is supposed to mean “The Solr code uses the 3
Thanks Shawn - I misspoke when I said recommendation, should have said
³packaged with². I appreciate the feedback and the quick updates to the
Jira issue. We¹ll plan to proceed with 3.4.12 when we go live.
-TZ
On 6/29/18, 11:38 AM, "Shawn Heisey" wrote:
>On 6/28/2018 8:39 PM, Zimmermann, Thomas
This is truly puzzling then, I'm clueless. It's hard to imagine this
is lurking out there and nobody else notices, but you've eliminated
the custom code. And this is also very peculiar:
* it occurs only in our main text search collection, all other
collections are unaffected;
* despite what i said
Solr doesn’t scale very well with ~2K collections, and yes de bottleneck is
Zookeeper itself.
Zookeeper doesn’t perform operation as quickly as expected with folders with a
lot of children.
In a scenario where you are in a recovery state (a node crash), this limitation
will hurt a lot, the que
Hello Erick,
The custom search handler doesn't interact with SolrIndexSearcher, this is
really all it does:
public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp)
throws Exception {
super.handleRequestBody(req, rsp);
if (rsp.getToLog().get("hits") instanceof I
bq. The only custom stuff left is an extension of SearchHandler that
only writes numFound to the response headers.
Well, one more to go ;). It's incredibly easy to overlook
innocent-seeming calls that increment the underlying reference count
of some objects but don't decrement them, usually throug
Hi,
In order to store timeseries data and perform deletion easily, we create a
several collections per day and then use aliases.
We are using SOLR 7.3 and we have 2 questions:
Q1 : In order to access quickly the latest data would it be possible to load
cores in descending chronological
Hello Yonik,
I took one node of the 7.2.1 cluster out of the load balancer so it would only
receive shard queries, this way i could kind of 'safely' disable our custom
components one by one, while keeping functionality in place by letting the
other 7.2.1 nodes continue on with the full configur
Ok. Will do. I saw the place in the code, but haven’t managed to get the code
to build, yet.
> On Jun 29, 2018, at 9:03 AM, Joel Bernstein wrote:
>
> Hi,
>
> Currently the nodes expression doesn't have this capability. Feel free to
> make a feature request on jira. This sounds like a fairly e
What _is_ your expectation? You haven't provided any examples of what
your input and expectations _are_.
You might review: https://wiki.apache.org/solr/UsingMailingLists
string types are case-sensitive for instance, so that's one thing that
could be happening. You
can also specify sortMissingFirs
On 6/29/2018 8:47 AM, Arturas Mazeika wrote:
Out of curiosity: some cores give infos for both shards (through
replication query) and some only for one (if you still be able to see the
prev post). I wonder why..
Adding to what Erick said:
If SolrCloud has initiated a replication on that core at
bq. It basically cuts down the search time in half in the usual case
for us, so it's an important 'feature'.
Wait. You mean that the "extra" call to get back 0 rows doubles your
query time? That's surprising, tell us more.
How many times does your "usual" use case call using CursorMark? My
off-th
On 6/28/2018 8:39 PM, Zimmermann, Thomas wrote:
I was wondering if there was a reason Solr 7.4 is still recommending ZK 3.4.11
as the major version in the official changelog vs shipping with 3.4.12 despite
the known regression in 3.4.11. Are there any known issues with running 7.4
alongside ZK
Arturas:
Please make yourself a promise, "Only use the collections commands" ;)
At least for a while.
Trying to mix collection-level commands and core-level commands is
extremely confusing at the start. Under the covers, the Collections
API _uses_ the Core API, but in a very precise manner. Any s
Hi,
Currently the nodes expression doesn't have this capability. Feel free to
make a feature request on jira. This sounds like a fairly easy feature to
add.
Joel Bernstein
http://joelsolr.blogspot.com/
On Wed, Jun 27, 2018 at 5:21 PM, Heidi McClure <
heidi.mccl...@polarisalpha.com> wrote:
> H
Hi Shawn et al,
Thanks a lot for the clarification. It makes a lot of sense and explains
which functionality needs to be used to get the infos :-).
Out of curiosity: some cores give infos for both shards (through
replication query) and some only for one (if you still be able to see the
prev post)
On 6/29/2018 7:53 AM, Arturas Mazeika wrote:
but the query reports infos on only one shard:
F:\solr_server\solr-7.2.1>curl -s
http://localhost:9996/solr/de_wiki_man/replication?command=details | grep
"indexPath\|indexSize"
"indexSize":"15.04 GB",
"indexPath":"F:\\solr_server\\solr-7.2.1\\e
Hi Solr-Team,
I am benchmarking solr with the German Wikipedia pages on 4 nodes (Running
on ports , 9998, 9997 and 9996), 4 shards, replication factor 2):
"F:\solr_server\solr-7.2.1\bin\solr.cmd" start -m 3g -cloud -p -s
"F:\solr_server\solr-7.2.1\example\cloud\node1\solr"
"F:\solr_serve
Thanks. I think that's a good point that it helps recognize port conflict at
start up. Although that scenario is unlikely in my case, I am going to try
to get it installed.
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Hi Tushar,
You're right; the docs are a little out of date there.
Krb5HttpClientConfigurer underwent some refactoring recently and came
out with a different name: Krb5HttpClientBuilder.
The ref-guide should update the snippet you were referencing to
something more like:
System.setProperty("java.
You might also have luck using the "NoOpResponseParser"
https://opensourceconnections.com/blog/2015/01/08/using-solr-cloud-for-robustness-but-returning-json-format/
https://lucene.apache.org/solr/7_0_0/solr-solrj/org/apache/solr/client/solrj/impl/NoOpResponseParser.html
(Disclaimer: Didn't try th
Hi Jesse,
you are correct, the variable 'bestScore' used in the
createQuery(PriorityQueue q) should be "minScore".
it is used to normalise the terms score :
tq = new BoostQuery(tq, boostFactor * myScore / bestScore);
e.g.
Queue -> Term1:100 , Term2:50, Term3:20, Term4:10
The minScore will be 10
Am 22.06.18 um 02:37 schrieb Chris Hostetter:
: the documentation of 'cursorMarks' recommends to fetch until a query returns
: the cursorMark that was passed in to a request.
:
: But that always requires an additional request at the end, so I wonder if I
: can stop already, if a request returns
Hello Eric,
title is a string field
On Wed, 27 Jun 2018, 9:21 pm Erick Erickson,
wrote:
> what kind of field is title? text_general or something? Sorting on a
> tokenized field is usually something you don't want to do. If a field
> has aardvard and zebra, how would it sort?
>
> There's usually
Hi,
It is probably the best if you merge some of your collections (or all) and have
discriminator field that will be used to filter out tenant’s documents only. In
case you go with multiple collections serving multiple tenants, you would have
to have logic on top of it to resolve tenant to colle
36 matches
Mail list logo