Hi, The file descriptor count is always quite low.. At the moment after heavy usage for a few days file descriptor counts are between 100-150 and I don't have any errors in the logs. My worry at the moment is around all the CLOSE_WAIT connections I am seeing. This is particularly true on the boxes marked as leaders, the replicas have a few but nowhere near as many.
Thanks for the response. -----Original Message----- From: Andre Bois-Crettez [mailto:andre.b...@kelkoo.com] Sent: 05 December 2012 17:57 To: solr-user@lucene.apache.org Subject: Re: FW: Replication error and Shard Inconsistencies.. Not sure but, maybe you are running out of file descriptors ? On each solr instance, look at the "dashboard" admin page, there is a bar with "File Descriptor Count". However if this was the case, I would expect to see lots of errors in the solr logs... André On 12/05/2012 06:41 PM, Annette Newton wrote: > Sorry to bombard you - final update of the day... > > One thing that I have noticed is that we have a lot of connections > between the solr boxes with the connection set to CLOSE_WAIT and they > hang around for ages. > > -----Original Message----- > From: Annette Newton [mailto:annette.new...@servicetick.com] > Sent: 05 December 2012 13:55 > To: solr-user@lucene.apache.org > Subject: FW: Replication error and Shard Inconsistencies.. > > Update: > > I did a full restart of the solr cloud setup, stopped all the > instances, cleared down zookeeper and started them up individually. I > then removed the index from one of the replicas, restarted solr and it > replicated ok. So I'm wondering whether this is something that happens over > a period of time. > > Also just to let you know I changed the schema a couple of times and > reloaded the cores on all instances previous to the problem. Don't > know if this could have contributed to the problem. > > Thanks. > > -----Original Message----- > From: Annette Newton [mailto:annette.new...@servicetick.com] > Sent: 05 December 2012 09:04 > To: solr-user@lucene.apache.org > Subject: RE: Replication error and Shard Inconsistencies.. > > Hi Mark, > > Thanks so much for the reply. > > We are using the release version of 4.0.. > > It's very strange replication appears to be underway, but no files are > being copied across. I have attached both the log from the new node > that I tried to bring up and the Schema and config we are using. > > I think it's probably something weird with our config, so I'm going to > play around with it today. If I make any progress I'll send an update. > > Thanks again. > > -----Original Message----- > From: Mark Miller [mailto:markrmil...@gmail.com] > Sent: 05 December 2012 00:04 > To: solr-user@lucene.apache.org > Subject: Re: Replication error and Shard Inconsistencies.. > > Hey Annette, > > Are you using Solr 4.0 final? A version of 4x or 5x? > > Do you have the logs for when the replica tried to catch up to the leader? > > Stopping and starting the node is actually a fine thing to do. Perhaps > you can try it again and capture the logs. > > If a node is not listed as live but is in the clusterstate, that is > fine. It shouldn't be consulted. To remove it, you either have to > unload it with the core admin api or you could manually delete it's > registered state under the node states node that the Overseer looks at. > > Also, it would be useful to see the logs of the new node coming > up.there should be info about what happens when it tries to replicate. > > It almost sounds like replication is just not working for your setup > at all and that you have to tweak some configuration. You shouldn't > see these nodes as active then though - so we should get to the bottom of > this. > > - Mark > > On Dec 4, 2012, at 4:37 AM, Annette > Newton<annette.new...@servicetick.com> > wrote: > >> Hi all, >> >> I have a quite weird issue with Solr cloud. I have a 4 shard, 2 >> replica > setup, yesterday one of the nodes lost communication with the cloud > setup, which resulted in it trying to run replication, this failed, > which has left me with a Shard (Shard 4) that has one node with > 2,833,940 documents on the leader and 409,837 on the follower - > obviously a big discrepancy and this leads to queries returning > differing results depending on which of these nodes it gets the data > from. There is no indication of a problem on the admin site other > than the big discrepancy in the number of documents. They are all marked as > active etc. >> >> So I thought that I would force replication to happen again, by >> stopping > and starting solr (probably the wrong thing to do) but this resulted > in no change. So I turned off that node and replaced it with a new > one. In zookeeper live nodes doesn't list that machine but it is > still being shown as active on in the ClusterState.json, I have attached > images showing this. > This means the new node hasn't replaced the old node but is now a > replica on Shard 1! Also that node doesn't appear to have replicated > Shard 1's data anyway, it didn't get marked with replicating or anything. >> >> How do I clear the zookeeper state without taking down the entire >> solr > cloud setup? How do I force a node to replicate from the others in > the shard? >> >> Thanks in advance. >> >> Annette Newton >> >> >> <LiveNodes.zip> > > > > > > > -- > André Bois-Crettez > > Search technology, Kelkoo > http://www.kelkoo.com/ Kelkoo SAS Société par Actions Simplifiée Au capital de € 4.168.964,30 Siège social : 8, rue du Sentier 75002 Paris 425 093 069 RCS Paris Ce message et les pièces jointes sont confidentiels et établis à l'attention exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce message, merci de le détruire et d'en avertir l'expéditeur.