Re: replication, disk space

Jonathan Rochkind Thu, 19 Jan 2012 09:43:35 -0800

Okay, I do have an index.properties file too, and THAT one does containthe name of an index directory.

But it's got the name of the timestamped index directory! Not sure howthat happened, could have been Solr trying to recover from running outof disk space in the middle of a replication? I certainly never did thatintentionally.

But okay, if someone can confirm if this plan makes sense to restorethings without downtime:

1. rm the 'index' directory, which seems to be an old copy of the indexat this point

2. 'mv index.20120113121302 index'

3. Manually edit index.properties to have index=index, notindex=index.20120113121302

4. Send reload core command.

Does this make sense? (I just experimentally tried an reload corecommand, and even though it's not supposed to, it DID result in about 20seconds of unresponsiveness from my solr server, not sure why, couldjust be lack of CPU or RAM on the server to do what's being asked of it.But if that's the best I can do, 20 minutes of unavailability, I'll takeit).


On 1/19/2012 12:37 PM, Jonathan Rochkind wrote:

Hmm, I don't have a "replication.properties" file, I don't think. Ohwait, yes I do there it is! I guess the replication process makesthis file?
Okay....
I don't see an index directory in the replication.properties file atall though. Below is my complete replication.properties.
So I'm still not sure how to properly recover from this situationwithotu downtime. It _looks_ to me like the timestamped directory isactually the live/recent one. It's files have a more recenttimestamp, and it's the one that /admin/replication.jsp mentions.
replication.properties:

#Replication details
#Wed Jan 18 10:58:25 EST 2012
confFilesReplicated=[solrconfig.xml, schema.xml]
timesIndexReplicated=350
lastCycleBytesDownloaded=6524299012
replicationFailedAtList=1326902305288,1326406990614,1326394654410,1326218508294,1322150197956,1321987735253,1316104240679,1314371534794,1306764945741,1306678853902
replicationFailedAt=1326902305288
timesConfigReplicated=1
indexReplicatedAtList=1326902305288,1326825419865,1326744428192,1326645554344,1326569088373,1326475488777,1326406990614,1326394654410,1326303313747,1326218508294
confFilesReplicatedAt=1316547200637
previousCycleTimeInSeconds=295
timesFailed=54
indexReplicatedAt=1326902305288
~


On 1/18/2012 1:41 PM, Dyer, James wrote:
I've seen this happen when the configuration files change on themaster and replication deems it necessary to do a core-reload on theslave. In this case, replication copies the entire index to the newdirectory then does a core re-load to make the new config files andnew index directory go live. Because it is keeping the old searcherrunning while the new searcher is being started, both index copies toexist until the swap is complete. I remember having the same concernabout re-starts, but I believe I tested this and solr will look atthe "replication.properties" file on startup and determine thecorrect index dir to use from that. So (If my memory is correct) youcan safely delete "index" so long as "replication.properties" pointsto the other directory.
I wasn't familiar with SOLR-1781. Maybe replication is supposed toclean up the extra directories and doesn't sometimes? In any case,I've found whenever it happens its ok to go out and delete the one(s)not being used, even if that means deleting "index".
James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311

-----Original Message-----
From: Artem Lokotosh [mailto:arco...@gmail.com]
Sent: Wednesday, January 18, 2012 12:24 PM
To: solr-user@lucene.apache.org
Subject: Re: replication, disk space

Which OS do you using?
Maybe related to this Solr bug
https://issues.apache.org/jira/browse/SOLR-1781
On Wed, Jan 18, 2012 at 6:32 PM, Jonathan Rochkind<rochk...@jhu.edu>wrote:
So Solr 1.4. I have a solr master/slave, where it actually doesn'tpoll forreplication, it only replicates irregularly when I issue a replicatecommand
to it.

After the last replication, the slave, in solr_home, has a data/index
directory as well as a data/index.20120113121302 directory.

The /admin/replication/index.jsp admin page reports:

Local Index
Index Version: 1326407139862, Generation: 183
Location: /opt/solr/solr_searcher/prod/data/index.20120113121302
So does this mean the index.XXXX file is actually the one currentlybeing
used live, not the straight 'index'? Why?
I can't afford the disk space to leave both of these aroundindefinitely.After replication completes and is committed, why would two indexdirs beleft? And how can I restore this to one index dir, withoutdowntime? Ifit's really using the "index.XXXXX" directory, then I could justdelete the
"index" directory, but that's a bad idea, because next time the server
starts it's going to be looking for "index", not "index.XXXX". Andif it'susing the timestamped index file now, I can't delete THAT one noweither.
If I was willing to restart the tomcat container, then I coulddelete one,
rename the other, etc. But I don't want downtime.
I really don't understand what's going on or how it got in thisstate. Any
ideas?

Jonathan

Re: replication, disk space

Reply via email to