Re: data consistency in solrcloud cluster deployed in aws

Luis Carlos Guerrero Covo Fri, 14 Jun 2013 15:05:17 -0700

Thank you for your reply otis. I found two open issues which may relate to
this issue:


https://issues.apache.org/jira/browse/SOLR-4924

https://issues.apache.org/jira/browse/SOLR-4260

We recently changed some settings to make commits happen on a more periodic
nature (5 mins or 25000 docs). Before, we ran the commits after every
import from DIH, so commits were more frequent and we were not
experiencieng this issue. The thing is I think this does not relate to
availability zones since I see that the generation number changes on the
replica which is behind every once in a while, but it does not update to a
recent version, but 50 or 60 versions behind the leader and the DIH node.
If this was due to network latency issues, then the versioning would only
be a bit behind.


On Fri, Jun 14, 2013 at 4:51 PM, Otis Gospodnetic <
otis.gospodne...@gmail.com> wrote:

> Yes, sounds like it's because of the second node being in a different
> AZ.  In AWS, AZ really means a DC (Data Center), so the node that is
> in a different AZ/DC is naturally going to replicate more slowly.
>
> Otis
> --
> Solr & ElasticSearch Support -- http://sematext.com/
>
>
>
>
>
> On Fri, Jun 14, 2013 at 8:50 AM, Luis Carlos Guerrero Covo
> <lcguerreroc...@gmail.com> wrote:
> > Hi,
> >
> > I currently have solrcloud setup with single shards and two nodes behind
> a
> > load balancer in aws. I also have an additional node in the cluster which
> > is outside the load balancer (not receiving any client requests)
> importing
> > data into the cluster using data import handler. So that takes my cluster
> > to 3 nodes, 2 receiving user requests and the single data import node.
> >
> > I'm experiencing several data replication issues that could be caused by
> > the irregular setup. The one node that is in the same availability zone
> as
> > the data import node (My two nodes are in two different aws availability
> > zones) is replicating correctly and is never far away from the import
> > node's generation number. The node that is in a different availability
> zone
> > is always lagging behind in terms of index replication. I'm mentioning
> > availability zones because I see that as the only thing that could be
> > causing this issue. Am I correct in asuming this? What are further steps
> > that I could take to verify what could be the cause of the index not
> > replicating fast enough to all nodes?
> >
> > thanks in advance for any help provided,
> >
> > Luis Guerrero
>



-- 
Luis Carlos Guerrero Covo
M.S. Computer Engineering
(57) 3183542047

Re: data consistency in solrcloud cluster deployed in aws

Reply via email to