Hello,

Reloading and restarting doesn't seem to help here. Just occasionally the 
replicas decide to finally replicate some files, the next few commits are just 
ignored.

I did finally found some errors.

On the leader:
2019-08-23 01:11:10.989 ERROR (qtp367746789-4669) [c:nutch s:shard1 
r:core_node40 x:collection_shard1_replica_t39] o.a.s.h.ReplicationHandler 
Unable to get file names for indexCommit generation:
 1205 => java.nio.file.NoSuchFileException: 
/..../data/collection_shard1_replica_t39/data/index/_27m_8b.liv
        at 
sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
java.nio.file.NoSuchFileException: 
/app/data/nutch_shard1_replica_t39/data/index/_27m_8b.liv
        at 
sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) 
~[?:1.8.0_222]
        at 
sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) 
~[?:1.8.0_222]
        at 
sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) 
~[?:1.8.0_222]
        at 
sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:55)
 ~[?:1.8.0_222]
        at 
sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:144)
 ~[?:1.8.0_222]
        at 
sun.nio.fs.LinuxFileSystemProvider.readAttributes(LinuxFileSystemProvider.java:99)
 ~[?:1.8.0_222]

On the slave:
No files to download for index generation: 1205

So it seems obvious now. The replica won't replicate because of the error on 
the leader. Is this a known error? New Jira?

Regards,
Markus
 
-----Original message-----
> From:Ere Maijala <ere.maij...@helsinki.fi>
> Sent: Friday 23rd August 2019 11:24
> To: solr-user@lucene.apache.org
> Subject: Re: 8.2.0 After changing replica types, state.json is wrong and 
> replication no longer takes place
> 
> Hi,
> 
> We've had PULL replicas stop replicating a couple of times in Solr 7.x.
> Restarting Solr has got it going again. No errors in logs, and I've been
> unable to reproduce the issue at will. At least once it happened when I
> reloaded a collection, but other times that hasn't caused any issues.
> 
> I'll make a note to check state.json next time we encounter the
> situation to see if I can see what you reported.
> 
> Regards,
> Ere
> 
> Markus Jelsma kirjoitti 22.8.2019 klo 16.36:
> > Hello,
> > 
> > There is a newly created 8.2.0 all NRT type cluster for which i replaced 
> > each NRT replica with a TLOG type replica. Now, the replicas no longer 
> > replicate when the leader receives data. The situation is odd, because some 
> > shard replicas kept replicating up until eight hours ago, another one (same 
> > collection, same node) seven hours, and even another one four hours!
> > 
> > I inspected state.json to see what might be wrong, and compare it with 
> > another fully working, but much older, 8.2.0 all TLOG collection.
> > 
> > The faulty one still lists, probably from when it was created:
> >     "nrtReplicas":"2",
> >     "tlogReplicas":"0"
> >     "pullReplicas":"0",
> >     "replicationFactor":"2",
> > 
> > The working collection only has:
> >     "replicationFactor":"1",
> > 
> > What actually could cause this new collection to start replicating when i 
> > delete the data directory, but later on stop replicating at some random 
> > time, which is different for each shard.
> > 
> > Is there something i should change in state.json, and can it just be 
> > reuploaded to ZK?
> > 
> > Thanks,
> > Markus
> > 
> 
> -- 
> Ere Maijala
> Kansalliskirjasto / The National Library of Finland
> 

Reply via email to