Unfortunately, when I went back to look at the logs this morning, the log
file had been blown away... that puts a major damper on my debugging
capabilities - so sorry about that.  As a double whammy, we optimize
nightly, so the old index files have completely changed at this point.

I do not remember seeing an exception / stack trace in the logs associated
with the "SEVERE *Unable to move file*" entry, but we were grepping the
logs, so if it was outputted onto another line it could have possibly been
there.  I wouldn't really expect to see anything based upon the code in
SnapPuller.java:

/**
   * Copy a file by the File#renameTo() method. If it fails, it is
considered a failure
   * <p/>
   * Todo may be we should try a simple copy if it fails
   */
  private boolean copyAFile(File tmpIdxDir, File indexDir, String fname,
List<String> copiedfiles) {
    File indexFileInTmpDir = new File(tmpIdxDir, fname);
    File indexFileInIndex = new File(indexDir, fname);
    boolean success = indexFileInTmpDir.renameTo(indexFileInIndex);
    if (!success) {
      LOG.error("Unable to move index file from: " + indexFileInTmpDir
              + " to: " + indexFileInIndex);
      for (String f : copiedfiles) {
        File indexFile = new File(indexDir, f);
        if (indexFile.exists())
          indexFile.delete();
      }
      delTree(tmpIdxDir);
      return false;
    }
    return true;
  }

In terms of whether this is an off case: this is the first occurrence of
this I have seen in the logs.  We tried to replicate the conditions under
which the exception occurred, but were unable.  I'll send along some more
useful info if this happens again.

In terms of the behavior we saw: It appears that a replication occurred and
the "Unable to move file" error occurred.  As a result, it looks like the
ENTIRE index was subsequently replicated again into a temporary directory
(several times, over and over).

The end result was that we had multiple full copies of the index in
temporary index folders on the slave, and the original still couldn't be
updated (the move to ./index wouldn't work).  Does Solr ever hold files open
in a manner that would prevent a file in the index directory from being
overridden?


2010/1/21 Noble Paul നോബിള്‍ नोब्ळ् <noble.p...@corp.aol.com>

> is it a one off case? do you observerve this frequently?
>
> On Thu, Jan 21, 2010 at 11:26 AM, Otis Gospodnetic
> <otis_gospodne...@yahoo.com> wrote:
> > It's hard to tell without poking around, but one of the first things I'd
> do would be to look for /home/solr/cores/core8/index.20100119103919/_6qv.fnm
> - does this file/dir really exist?  Or, rather, did it exist when the error
> happened.
> >
> > I'm not looking at the source code now, but is that really the only error
> you got?  No exception stack trace?
> >
> >  Otis
> > --
> > Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch
> >
> >
> >
> > ----- Original Message ----
> >> From: Trey <solrt...@gmail.com>
> >> To: solr-user@lucene.apache.org
> >> Sent: Wed, January 20, 2010 11:54:43 PM
> >> Subject: Replication Handler Severe Error: Unable to move index file
> >>
> >> Does anyone know what would cause the following error?:
> >>
> >> 10:45:10 AM org.apache.solr.handler.SnapPuller copyAFile
> >>
> >>      SEVERE: *Unable to move index file* from:
> >> /home/solr/cores/core8/index.20100119103919/_6qv.fnm to:
> >> /home/solr/cores/core8/index/_6qv.fnm
> >> This occurred a few days back and we noticed that several full copies of
> the
> >> index were subsequently pulled from the master to the slave, effectively
> >> evicting our live index from RAM (the linux os cache), and killing our
> query
> >> performance due to disk io contention.
> >>
> >> Has anyone experienced this behavior recently?  I found an old thread
> about
> >> this error from early 2009, but it looks like it was patched almost a
> year
> >> ago:
> >>
> http://old.nabble.com/%22Unable-to-move-index-file%22-error-during-replication-td21157722.html
> >>
> >>
> >> Additional Relevant information:
> >> -We are using the Solr 1.4 official release + a field collapsing patch
> from
> >> mid December (which I believe should only affect query side, not
> indexing /
> >> replication).
> >> -Our Replication PollInterval for slaves checking the master is very
> small
> >> (15 seconds)
> >> -We have a multi-box distributed search with each box possessing
> multiple
> >> cores
> >> -We issue a manual (rolling) optimize across the cores on the master
> once a
> >> day (occurred ~ 1-2 hours before the above timeline)
> >> -maxWarmingSearchers is set to 1.
> >
> >
>
>
>
> --
> -----------------------------------------------------
> Noble Paul | Systems Architect| AOL | http://aol.com
>

Reply via email to