Solr Cloud (6.6.3), Zookeeper(3.4.10) and ELB's
Our team is having problems with our production setup in AWS. Our current setup is: - Dockerized solr nodes behind an ELB - zookeeper with exhibitor in a docker container (3 of this set) - solr talks to a zookeeper through an ELB (should we even do this? we did this for recovery purposes so if there are better ways to handle this, please describe it in your reply) - There are scripts in zknodes and solr nodes to monitor and restart docker containers if it goes down. So in production, solrnodes sometimes goes down and will be restarted by the scripts. During recovery, some shards won't have a leader and because of that, indexing won't work. Adding replica's will also sometimes yield to multiple replica's in the same node with a lot more than we want (we added one and got eight at one time). So my question is, are we doing something wrong here? -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: SolrCloud Collection Backup - Solr 5.5.4
On 6/1/2018 7:23 AM, Greenhorn Techie wrote: > We are running SolrCloud with version 5.5.4. As I understand, Solr > Collection Backup and Restore API are only supported from version 6 > onwards. So wondering what is the best mechanism to get our collections > backed-up on older Solr version. That functionality was added in 6.1. https://issues.apache.org/jira/browse/SOLR-5750 > When I ran backup command on a particular node (curl > http://localhost:8983/solr/gettingstarted/replication?command=backup) it > seems it only creates a snapshot for the collection data stored on that > particular node. Does that mean, if I run this command for every node > hosting my SolrCloud collection, I will be getting the required backup? > Will this backup the metadata as well from ZK? I presume not. If you provide a location parameter, it will write a new backup directory in that location. https://lucene.apache.org/solr/guide/6_6/making-and-restoring-backups.html#standalone-mode-backups I verified that this parameter is in the 5.5 docs too, I would suggest you download that version in PDF format if you want a full reference. It would probably be a good idea to create a separate directory for each core that you work on. If the backup is done on all the right cores, you will get all the index data, but you will have no info from ZK. If the collection has more than one shard and uses the compositeId router, then you will need the info frpom the collection's clusterstate aabout hash shard ranges, and those would have to be verified and possibly adjusted on the new collection before you started putting the data back in. If the new collection uses different hash ranges than the one you backed up, then the restored collection would not function correctly. > If so, what > are the best possible approaches to get the same. Is there something made > available by Solr for the same? If you can do it, upgrading to the latest 6.x or 7.x version would be a good idea, to have full SolrCloud backup and restore functionality. -- You asked me some questions via IRC when I wasn't around, then were logged off by the time I got back to IRC. I don't know when you might come back online there. Here's some info for those questions: The reason that 'ant server' isn't working is that you're at the top level of the source. It should work if you change to the solr directory first. Similar to what you've encountered, I can't get eclipse to work properly when using a downloaded 6.6.2 source package (solr-6.6.2-src.tgz). But if I use these commands instead, then import into eclipse, it works: git clone https://git-wip-us.apache.org/repos/asf/lucene-solr.git cd lucene-solr git checkout refs/tags/releases/lucene-solr/6.6.2 ant clean clean-jars clean-eclipse eclipse The clean targets are not strictly necessary with a fresh clone, but that works even when the tree isn't fresh. I've never had very good luck with the downloadable source packages. Some of the build system functionality *only* works when the source is obtained with git, so I prefer that. Thanks, Shawn
Re: SolrCloud Collection Backup - Solr 5.5.4
On 6/2/2018 1:50 AM, Shawn Heisey wrote: If you provide a location parameter, it will write a new backup directory in that location. https://lucene.apache.org/solr/guide/6_6/making-and-restoring-backups.html#standalone-mode-backups I verified that this parameter is in the 5.5 docs too, I would suggest you download that version in PDF format if you want a full reference. A followup: I suspect that if you try to use the restore functionality on the replication handler and have multiple shard replicas, that SolrCloud would not replicate things properly. I could be wrong about that, but I think that restoring from replication handler backups to SolrCloud could get a little messy. Thanks, Shawn
Re: Solr Cloud (6.6.3), Zookeeper(3.4.10) and ELB's
On 6/2/2018 1:49 AM, solrnoobie wrote: Our team is having problems with our production setup in AWS. Our current setup is: - Dockerized solr nodes behind an ELB Putting Solr behind a load balancer is a pretty normal thing to do. - zookeeper with exhibitor in a docker container (3 of this set) I don't know anything about exhibitor. You'd need to discuss that with a zookeeper expert. - solr talks to a zookeeper through an ELB (should we even do this? we did this for recovery purposes so if there are better ways to handle this, please describe it in your reply) Definitely not. ZK is designed for fault tolerance *without* a load balancer. Solr gets configured with all the ZK servers and will connect to all of them at the same time. Every ZK server in the ensemble is configured with a list of all ZK servers, and they communicate with each other directly. Putting a load balancer in the mix can *cause* problems, it won't solve them. - There are scripts in zknodes and solr nodes to monitor and restart docker containers if it goes down. In general it's probably not a good idea to restart Solr automatically if it goes down. Incidences where Solr crashes are EXTREMELY rare. I bet if you asked the zookeeper project about automatically restarting their software, they would tell you the same thing. There is one relatively common scenario where Solr *will* stop running: If Java experiences an OutOfMemoryError exception. Solr is designed to kill itself when OOME is thrown, because program operation is completely unpredictable after OOME. Stopping all operation is the only safe thing to do. This is why it's a bad idea to restart Solr after OOME: Encountering that exception is caused by a resource shortage. Usually it's Java heap memory that's run out, but there are other resource shortages that lead to that exception. Chances are excellent that once Solr gets back online and begins handling load, the same resource shortage is almost certain to happen again. It could happen repeatedly, leading to a constant restart cycle that becomes a stability nightmare. Instead of immediately restarting, the resource shortage problem must be fixed. So in production, solrnodes sometimes goes down and will be restarted by the scripts. During recovery, some shards won't have a leader and because of that, indexing won't work. Adding replica's will also sometimes yield to multiple replica's in the same node with a lot more than we want (we added one and got eight at one time). Often when one node dies because of a resource shortage, it can cause the other nodes to take on more load and then *also* die because of the same kind of resource shortage. Outages on multiple servers in quick succession can be one reason for having recovery problems. One thing you might need to do when the cloud becomes unstable is to shut down all Solr servers and then start them back up one at a time and make sure that everything on that server has recovered before starting another one. One thing that's important to say again: Except for when Solr is killed by its own OOM killer or by the OOM killer in the operating system, Solr basically NEVER crashes. I'm not saying that it can't happen, but I've only ever seen it in cases where the server hardware was failing. If the OOM killer in the OS is responsible for Solr stopping, then your Solr logfile will not record any exceptions. When the OS-level OOM killer is triggered, it's usually an indication that a serious mistake has been made in choosing Solr's max heap size. It's hard to say exactly why you might end up with shards that don't have a leader. Check your solr logfiles for error messages. I will say that the automatic restarts you've described could be a big part of the problem. Just so you know, Solr tends to want there to be a LOT of memory available. The amount required for good performance is sometimes shocking to users. Here's a page that describes some Solr performance problems, and tries to explain that memory is typically the resource that's at the root of most of those problems: https://wiki.apache.org/solr/SolrPerformanceProblems Thanks, Shawn
Solr Cloud (6.6.3), Zookeeper(3.4.10) and ELB's
I need in help on our production setup so I will describe our setup and issues below. Solr is constantly going down even if there is no load / traffic. We have recovery scripts for that but sometimes recovery is not going well because nodes can't elect a leader so sometimes shards are unsusable / indexable. When we manually add replicas, sometimes it adds a lot more than we invoked (There was one time where we added 1 and got 8 after an hour). So our setup is like this: We are deployed in AWS. We have a load balancer on top of the solr nodes. We have a load balancer on top of zookeeper / exhibitor. We have scripts in the solr and zk nodes that will autorestart docker containers if it goes down. So my question is if is it ok to have load balancers on top of a zookeeper or should we use elastic ip's instead? -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Solr Cloud (6.6.3), Zookeeper(3.4.10) and ELB's
Thank you very much for this reply! Thank you for pointing out our error in having an ELB on top of a zookeeper. We did this so that we could recover a node if it goes down without the need to have a rolling restart of the solr nodes. I guess we will try an elastic IP instead because part of our requirement is that it should automatically spawn an EC2 instance with a zk node if for some reason one instance fails. I guess this way we still won't need to restart our solr nodes and still replace the zknode(s) behind an elastic IP? I'm guessing we are experiencing problems with leader election because the solr nodes can't maintain a tcp connection with the zknodes but I don't have a way of proving that so our team can't really pitch this to our architect. I hope someone here can help me with this since it has been a problem for a LONG time now and we are getting a lot of flak from the other stakeholders because of this. Anyway thank you for the reply! I hope we can solve this soon!! -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Solr Cloud (6.6.3), Zookeeper(3.4.10) and ELB's
On 6/2/2018 5:20 AM, solrnoobie wrote: Thank you for pointing out our error in having an ELB on top of a zookeeper. We did this so that we could recover a node if it goes down without the need to have a rolling restart of the solr nodes. I guess we will try an elastic IP instead because part of our requirement is that it should automatically spawn an EC2 instance with a zk node if for some reason one instance fails. I guess this way we still won't need to restart our solr nodes and still replace the zknode(s) behind an elastic IP? ZK servers and clients make TCP connections to all of the servers in their config, and if things are working right, don't ever close those connections. If you put a load balancer in there, it can REALLY confuse the system making the connection. If you have three ZK servers and one of them fails, all the clients and remaining servers should be able to deal with this, and when the server comes back, they should deal with that too. If that doesn't happen, it might be a bug in ZK and the ZK project will treat it seriously. ZK version 3.4.x, which is the current stable release and what Solr ships with, cannot dynamically add or remove servers, so spinning up a brand new ZK server is not going to work out. To add or remove a server in the ZK cluster, *EVERY* client and server is going to need to be manually reconfigured and restarted. Dynamic ensemble membership is available in ZK 3.5.x, which is currently in beta. If I had to guess about when Solr will upgrade, I would say it will happen on the second or third stable 3.5.x release, so there is enough time to be sure the software really is battle-tested. ZK has a *VERY* slow release cycle, so I am expecting this to take several months. The upgrade is not going to happen in Solr 6.x, though. Expect it in a later 7.x release or maybe 8.0. I'm guessing we are experiencing problems with leader election because the solr nodes can't maintain a tcp connection with the zknodes but I don't have a way of proving that so our team can't really pitch this to our architect. I hope someone here can help me with this since it has been a problem for a LONG time now and we are getting a lot of flak from the other stakeholders because of this. I hope what I've said above is helpful. I think that eliminating load balancer usage for ZK and automatic service restart will help. If you ARE experiencing situations where the services die or stop responding, chances are really good that you are running into OOME. If that is what's happening, you will need to figure out what resource is short and make more of that resource available. It's usually Java heap memory, but it could be other things like the inability to start a new thread because the OS has a low limit on the number of processes that a user is allowed to start. You'll have to check logs to see exactly what went wrong. If the logfile doesn't show anything, then the OS might have decided to kill the process for its own reasons, which should be in the system log. The Solr 6.6.3 version is not a bad choice. The latest 6.x release is 6.6.4, but the problem fixed in 6.6.4 is probably not affecting you. There's a lot of work in 7.x for SolrCloud stability, but a major version upgrade is not something to treat lightly, and should only be something that you attempt if it is *ALREADY* what you plan to do. Thanks, Shawn
Re: Solr Cloud (6.6.3), Zookeeper(3.4.10) and ELB's
> On Jun 2, 2018, at 12:27 AM, solrnoobie wrote: > > So my question is if is it ok to have load balancers on top of a zookeeper > or should we use elastic ip's instead? No and no. Use fixed hostnames. List all Zookeeper nodes in the zkhost spec. Zookeeper is not a stateless service where any node can handle any request. It is the opposite. It has fault-tolerant algorithms for correctly handling state changes with failures. Give each Zookeeper node an address and a name. Give all the names to all the Solr hosts. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog)
Re: search q via dynamic string depends on date
Yeah shawn, you're right. But what i hoped to get was a hint howto build this a easy way. Maybe something like a double query with a check between. check fq = highest match will be last matchday = query with last matchday i will google for some php script that can do this if you got a hint, feel free to show up... :) kindest regards Francois -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Setting up Solr Replica on different machine
Hi Shawn, Yes, I am using SolrCloud. The multiple replicas on the same machine is only for testing. Regards, Edwin On 1 June 2018 at 20:35, Shawn Heisey wrote: > On 5/31/2018 11:38 PM, Zheng Lin Edwin Yeo wrote: > >> I am planning to set up Solr with replica on different machine. How should >> I go about configuring the setup? Like for example, should the replica >> node >> be started on the host machine, or on the replica machine? >> >> I will be setting this in Solr 7.3.1. >> > > What would you be trying to achieve by putting multiple replicas on the > same machine? I don't think that makes any sense at all. > > Is this SolrCloud? > > In SolrCloud, there's no master and no slave. Also, no "host" and > "replica." All copies of the index, including the leaders, are called > replicas. > > For standalone Solr, you're almost certainly going to want master/slave > replication, unless you want to update multiple independent copies. There > are a few advantages to independently updating replicas, but the indexing > software is necessarily more complex. > > Thanks, > Shawn > >
Re: Synonym(Graph)FilterFactory seems to ignore tokenizerFactory.* parameters.
Does anyone teach me if this is a bug or intended? 2018年5月31日(木) 13:53 Yasufumi Mizoguchi : > Hi, community. > > I want to use Synonym(Graph)Filter with JapaneseTokenizer and > NGramTokenizer. > But it turned out that Synonym(Graph)FilterFactory seemed to ignore > tokenizerFactory.* parameters such as "tokenizerFactory.maxGramSize", > "tokenizerFactory.userDictionary" etc... when using > managed-schema(ManagedIndexSchema class). > > Is this a bug? > > I found the similar issue in JIRA( > https://issues.apache.org/jira/browse/SOLR-10010) and also found that > Solr would respect the parameters when > "informResourceLoaderAwareObjectsForFieldType" method called in > "postReadInform" method is commented out as seen in the JIRA issue. > ( > https://github.com/apache/lucene-solr/blob/a03d6bc8c27d3d97011bc5bdc2aeb94c4820628c/solr/core/src/java/org/apache/solr/schema/ManagedIndexSchema.java#L1153 > ) > > Thanks, > Yasufumi > > >