Solr Cloud (6.6.3), Zookeeper(3.4.10) and ELB's

2018-06-02 Thread solrnoobie
Our team is having problems with our production setup in AWS.

Our current setup is:
- Dockerized solr nodes behind an ELB
- zookeeper with exhibitor in a docker container (3 of this set)
- solr talks to a zookeeper through an ELB (should we even do this? we did
this for recovery purposes so if there are better ways to handle this,
please describe it in your reply)
- There are scripts in zknodes and solr nodes to monitor and restart docker
containers if it goes down.

So in production, solrnodes sometimes goes down and will be restarted by the
scripts. During recovery, some shards won't have a leader and because of
that, indexing won't work. Adding replica's will also sometimes yield to
multiple replica's in the same node with a lot more than we want (we added
one and got eight at one time).

So my question is, are we doing something wrong here?



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: SolrCloud Collection Backup - Solr 5.5.4

2018-06-02 Thread Shawn Heisey
On 6/1/2018 7:23 AM, Greenhorn Techie wrote:
> We are running SolrCloud with version 5.5.4. As I understand, Solr
> Collection Backup and Restore API are only supported from version 6
> onwards. So wondering what is the best mechanism to get our collections
> backed-up on older Solr version.

That functionality was added in 6.1.

https://issues.apache.org/jira/browse/SOLR-5750

> When I ran backup command on a particular node (curl
> http://localhost:8983/solr/gettingstarted/replication?command=backup) it
> seems it only creates a snapshot for the collection data stored on that
> particular node. Does that mean, if I run this command for every node
> hosting my SolrCloud collection, I will be getting the required backup?
> Will this backup the metadata as well from ZK? I presume not.

If you provide a location parameter, it will write a new backup
directory in that location.

https://lucene.apache.org/solr/guide/6_6/making-and-restoring-backups.html#standalone-mode-backups

I verified that this parameter is in the 5.5 docs too, I would suggest
you download that version in PDF format if you want a full reference.

It would probably be a good idea to create a separate directory for each
core that you work on.

If the backup is done on all the right cores, you will get all the index
data, but you will have no info from ZK.  If the collection has more
than one shard and uses the compositeId router, then you will need the
info frpom the collection's clusterstate aabout hash shard ranges, and
those would have to be verified and possibly adjusted on the new
collection before you started putting the data back in.  If the new
collection uses different hash ranges than the one you backed up, then
the restored collection would not function correctly.

> If so, what
> are the best possible approaches to get the same. Is there something made
> available by Solr for the same?

If you can do it, upgrading to the latest 6.x or 7.x version would be a
good idea, to have full SolrCloud backup and restore functionality.

--

You asked me some questions via IRC when I wasn't around, then were
logged off by the time I got back to IRC.  I don't know when you might
come back online there.  Here's some info for those questions:

The reason that 'ant server' isn't working is that you're at the top
level of the source.  It should work if you change to the solr directory
first.

Similar to what you've encountered, I can't get eclipse to work properly
when using a downloaded 6.6.2 source package (solr-6.6.2-src.tgz).  But
if I use these commands instead, then import into eclipse, it works:

git clone https://git-wip-us.apache.org/repos/asf/lucene-solr.git
cd lucene-solr
git checkout refs/tags/releases/lucene-solr/6.6.2
ant clean clean-jars clean-eclipse eclipse

The clean targets are not strictly necessary with a fresh clone, but
that works even when the tree isn't fresh.

I've never had very good luck with the downloadable source packages. 
Some of the build system functionality *only* works when the source is
obtained with git, so I prefer that.

Thanks,
Shawn



Re: SolrCloud Collection Backup - Solr 5.5.4

2018-06-02 Thread Shawn Heisey

On 6/2/2018 1:50 AM, Shawn Heisey wrote:

If you provide a location parameter, it will write a new backup
directory in that location.

https://lucene.apache.org/solr/guide/6_6/making-and-restoring-backups.html#standalone-mode-backups

I verified that this parameter is in the 5.5 docs too, I would suggest
you download that version in PDF format if you want a full reference.


A followup:

I suspect that if you try to use the restore functionality on the 
replication handler and have multiple shard replicas, that SolrCloud 
would not replicate things properly.  I could be wrong about that, but I 
think that restoring from replication handler backups to SolrCloud could 
get a little messy.


Thanks,
Shawn



Re: Solr Cloud (6.6.3), Zookeeper(3.4.10) and ELB's

2018-06-02 Thread Shawn Heisey

On 6/2/2018 1:49 AM, solrnoobie wrote:

Our team is having problems with our production setup in AWS.

Our current setup is:
- Dockerized solr nodes behind an ELB


Putting Solr behind a load balancer is a pretty normal thing to do.


- zookeeper with exhibitor in a docker container (3 of this set)


I don't know anything about exhibitor.  You'd need to discuss that with 
a zookeeper expert.



- solr talks to a zookeeper through an ELB (should we even do this? we did
this for recovery purposes so if there are better ways to handle this,
please describe it in your reply)


Definitely not.  ZK is designed for fault tolerance *without* a load 
balancer.  Solr gets configured with all the ZK servers and will connect 
to all of them at the same time.  Every ZK server in the ensemble is 
configured with a list of all ZK servers, and they communicate with each 
other directly.  Putting a load balancer in the mix can *cause* 
problems, it won't solve them.



- There are scripts in zknodes and solr nodes to monitor and restart docker
containers if it goes down.


In general it's probably not a good idea to restart Solr automatically 
if it goes down.  Incidences where Solr crashes are EXTREMELY rare.  I 
bet if you asked the zookeeper project about automatically restarting 
their software, they would tell you the same thing.


There is one relatively common scenario where Solr *will* stop running:  
If Java experiences an OutOfMemoryError exception. Solr is designed to 
kill itself when OOME is thrown, because program operation is completely 
unpredictable after OOME. Stopping all operation is the only safe thing 
to do.


This is why it's a bad idea to restart Solr after OOME: Encountering 
that exception is caused by a resource shortage. Usually it's Java heap 
memory that's run out, but there are other resource shortages that lead 
to that exception.  Chances are excellent that once Solr gets back 
online and begins handling load, the same resource shortage is almost 
certain to happen again.  It could happen repeatedly, leading to a 
constant restart cycle that becomes a stability nightmare.  Instead of 
immediately restarting, the resource shortage problem must be fixed.



So in production, solrnodes sometimes goes down and will be restarted by the
scripts. During recovery, some shards won't have a leader and because of
that, indexing won't work. Adding replica's will also sometimes yield to
multiple replica's in the same node with a lot more than we want (we added
one and got eight at one time).


Often when one node dies because of a resource shortage, it can cause 
the other nodes to take on more load and then *also* die because of the 
same kind of resource shortage.  Outages on multiple servers in quick 
succession can be one reason for having recovery problems.  One thing 
you might need to do when the cloud becomes unstable is to shut down all 
Solr servers and then start them back up one at a time and make sure 
that everything on that server has recovered before starting another one.


One thing that's important to say again: Except for when Solr is killed 
by its own OOM killer or by the OOM killer in the operating system, Solr 
basically NEVER crashes.  I'm not saying that it can't happen, but I've 
only ever seen it in cases where the server hardware was failing.


If the OOM killer in the OS is responsible for Solr stopping, then your 
Solr logfile will not record any exceptions. When the OS-level OOM 
killer is triggered, it's usually an indication that a serious mistake 
has been made in choosing Solr's max heap size.


It's hard to say exactly why you might end up with shards that don't 
have a leader.  Check your solr logfiles for error messages.  I will say 
that the automatic restarts you've described could be a big part of the 
problem.


Just so you know, Solr tends to want there to be a LOT of memory 
available.  The amount required for good performance is sometimes 
shocking to users.  Here's a page that describes some Solr performance 
problems, and tries to explain that memory is typically the resource 
that's at the root of most of those problems:


https://wiki.apache.org/solr/SolrPerformanceProblems

Thanks,
Shawn



Solr Cloud (6.6.3), Zookeeper(3.4.10) and ELB's

2018-06-02 Thread solrnoobie
I need in help on our production setup so I will describe our setup and
issues below.

Solr is constantly going down even if there is no load / traffic. We have
recovery scripts for that but sometimes recovery is not going well because
nodes can't elect a leader so sometimes shards are unsusable / indexable.
When we manually add replicas, sometimes it adds a lot more than we invoked
(There was one time where we added 1 and got 8 after an hour).

So our setup is like this:

We are deployed in AWS.

We have a load balancer on top of the solr nodes. We have a load balancer on
top of zookeeper / exhibitor.
We have scripts in the solr and zk nodes that will autorestart docker
containers if it goes down.

So my question is if is it ok to have load balancers on top of a zookeeper
or should we use elastic ip's instead?



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Solr Cloud (6.6.3), Zookeeper(3.4.10) and ELB's

2018-06-02 Thread solrnoobie
Thank you very much for this reply!

Thank you for pointing out our error in having an ELB on top of a zookeeper.
We did this so that we could recover a node if it goes down without the need
to have a rolling restart of the solr nodes. I guess we will try an elastic
IP instead because part of our requirement is that it should automatically
spawn an EC2 instance with a zk node if for some reason one instance fails.
I guess this way we still won't need to restart our solr nodes and still
replace the zknode(s) behind an elastic IP?

I'm guessing we are experiencing problems with leader election because the
solr nodes can't maintain a tcp connection with the zknodes but I don't have
a way of proving that so our team can't really pitch this to our architect.
I hope someone here can help me with this since it has been a problem for a
LONG time now and we are getting a lot of flak from the other stakeholders
because of this.

Anyway thank you for the reply! I hope we can solve this soon!!



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Solr Cloud (6.6.3), Zookeeper(3.4.10) and ELB's

2018-06-02 Thread Shawn Heisey

On 6/2/2018 5:20 AM, solrnoobie wrote:

Thank you for pointing out our error in having an ELB on top of a zookeeper.
We did this so that we could recover a node if it goes down without the need
to have a rolling restart of the solr nodes. I guess we will try an elastic
IP instead because part of our requirement is that it should automatically
spawn an EC2 instance with a zk node if for some reason one instance fails.
I guess this way we still won't need to restart our solr nodes and still
replace the zknode(s) behind an elastic IP?


ZK servers and clients make TCP connections to all of the servers in 
their config, and if things are working right, don't ever close those 
connections.  If you put a load balancer in there, it can REALLY confuse 
the system making the connection.


If you have three ZK servers and one of them fails, all the clients and 
remaining servers should be able to deal with this, and when the server 
comes back, they should deal with that too. If that doesn't happen, it 
might be a bug in ZK and the ZK project will treat it seriously.


ZK version 3.4.x, which is the current stable release and what Solr 
ships with, cannot dynamically add or remove servers, so spinning up a 
brand new ZK server is not going to work out.  To add or remove a server 
in the ZK cluster, *EVERY* client and server is going to need to be 
manually reconfigured and restarted.


Dynamic ensemble membership is available in ZK 3.5.x, which is currently 
in beta.  If I had to guess about when Solr will upgrade, I would say it 
will happen on the second or third stable 3.5.x release, so there is 
enough time to be sure the software really is battle-tested.  ZK has a 
*VERY* slow release cycle, so I am expecting this to take several 
months.  The upgrade is not going to happen in Solr 6.x, though.  Expect 
it in a later 7.x release or maybe 8.0.



I'm guessing we are experiencing problems with leader election because the
solr nodes can't maintain a tcp connection with the zknodes but I don't have
a way of proving that so our team can't really pitch this to our architect.
I hope someone here can help me with this since it has been a problem for a
LONG time now and we are getting a lot of flak from the other stakeholders
because of this.


I hope what I've said above is helpful.  I think that eliminating load 
balancer usage for ZK and automatic service restart will help.  If you 
ARE experiencing situations where the services die or stop responding, 
chances are really good that you are running into OOME.  If that is 
what's happening, you will need to figure out what resource is short and 
make more of that resource available.  It's usually Java heap memory, 
but it could be other things like the inability to start a new thread 
because the OS has a low limit on the number of processes that a user is 
allowed to start.  You'll have to check logs to see exactly what went 
wrong.  If the logfile doesn't show anything, then the OS might have 
decided to kill the process for its own reasons, which should be in the 
system log.


The Solr 6.6.3 version is not a bad choice.  The latest 6.x release is 
6.6.4, but the problem fixed in 6.6.4 is probably not affecting you.  
There's a lot of work in 7.x for SolrCloud stability, but a major 
version upgrade is not something to treat lightly, and should only be 
something that you attempt if it is *ALREADY* what you plan to do.


Thanks,
Shawn



Re: Solr Cloud (6.6.3), Zookeeper(3.4.10) and ELB's

2018-06-02 Thread Walter Underwood
> On Jun 2, 2018, at 12:27 AM, solrnoobie  wrote:
> 
> So my question is if is it ok to have load balancers on top of a zookeeper
> or should we use elastic ip's instead?

No and no. Use fixed hostnames. List all Zookeeper nodes in the zkhost spec.

Zookeeper is not a stateless service where any node can handle any request.
It is the opposite. It has fault-tolerant algorithms for correctly handling 
state
changes with failures.

Give each Zookeeper node an address and a name. Give all the names to all
the Solr hosts.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)



Re: search q via dynamic string depends on date

2018-06-02 Thread servus01
Yeah shawn,

you're right. But what i hoped to get was a hint howto build this a easy
way. Maybe something like a double query with a check between.

check fq = highest match will be last matchday = query with last matchday

i will google for some php script that can do this if you got a hint,
feel free to show up... :)


kindest regards

Francois



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Setting up Solr Replica on different machine

2018-06-02 Thread Zheng Lin Edwin Yeo
Hi Shawn,

Yes, I am using SolrCloud.

The multiple replicas on the same machine is only for testing.

Regards,
Edwin

On 1 June 2018 at 20:35, Shawn Heisey  wrote:

> On 5/31/2018 11:38 PM, Zheng Lin Edwin Yeo wrote:
>
>> I am planning to set up Solr with replica on different machine. How should
>> I go about configuring the setup? Like for example, should the replica
>> node
>> be started on the host machine, or on the replica machine?
>>
>> I will be setting this in Solr 7.3.1.
>>
>
> What would you be trying to achieve by putting multiple replicas on the
> same machine?  I don't think that makes any sense at all.
>
> Is this SolrCloud?
>
> In SolrCloud, there's no master and no slave.  Also, no "host" and
> "replica."  All copies of the index, including the leaders, are called
> replicas.
>
> For standalone Solr, you're almost certainly going to want master/slave
> replication, unless you want to update multiple independent copies.  There
> are a few advantages to independently updating replicas, but the indexing
> software is necessarily more complex.
>
> Thanks,
> Shawn
>
>


Re: Synonym(Graph)FilterFactory seems to ignore tokenizerFactory.* parameters.

2018-06-02 Thread Yasufumi Mizoguchi
Does anyone teach me if this is a bug or intended?

2018年5月31日(木) 13:53 Yasufumi Mizoguchi :

> Hi, community.
>
> I want to use Synonym(Graph)Filter with JapaneseTokenizer and
> NGramTokenizer.
> But it turned out that Synonym(Graph)FilterFactory seemed to ignore
> tokenizerFactory.* parameters such as "tokenizerFactory.maxGramSize",
> "tokenizerFactory.userDictionary" etc... when using
> managed-schema(ManagedIndexSchema class).
>
> Is this a bug?
>
> I found the similar issue in JIRA(
> https://issues.apache.org/jira/browse/SOLR-10010) and also found that
> Solr would respect the parameters when
> "informResourceLoaderAwareObjectsForFieldType" method called in
> "postReadInform" method is commented out as seen in the JIRA issue.
> (
> https://github.com/apache/lucene-solr/blob/a03d6bc8c27d3d97011bc5bdc2aeb94c4820628c/solr/core/src/java/org/apache/solr/schema/ManagedIndexSchema.java#L1153
> )
>
> Thanks,
> Yasufumi
>
>
>