Trunk may actually still hide the issue (possibly), but something really
funky seems to have gone on and I can't find it yet. Do you have any
custom code interacting with solr?
None whatsoever...I am using out-of-the-box solr 1.3 (build of 10/23). I am
using my C# app to http requests to my solr instance.
Is there something you want me to try at my end that might give you a clue?
Let me know and I can try to help out.
Best,
- Bill
--------------------------------------------------
From: "Mark Miller" <[EMAIL PROTECTED]>
Sent: Saturday, November 15, 2008 10:59 AM
To: <solr-user@lucene.apache.org>
Subject: Re: Fatal exception in solr 1.3+ replication
Havn't given up, but this has really got me so far. For every path,
FSDirectory allows just one instance of FSDirectory to exist, and it keeps
a ref count of how many have been returned from openDirectory for a given
path. An FSDirectory will not actually be closed unless all references to
it are released (it doesn't actually even do anything in close, other than
drop the reference). So pretty much, the only way to get into trouble is
to call close enough times to equal how many times you called
openDirectory, and then try to use the FSDirectory again. This is what
your stack trace indicates is happening.
So we can get the call hierarchy for directory.close() in solr, and we
find that everything is pretty matched up...at worst it looks that a
reference might not be closed - but that doesn't hurt
anything...FSDirectory will just think there is something out there
holding onto a reference for that directory and allow you to continue
using it, even though no such reference might exist. Its only when enough
closes are called that an instance will be marked as closed (a further
openDirectory will return a new open instance). So to get your error,
something has to close a Directory that it did not get from openDirectory.
We know that the IndexReader that is trying to be reopened must have
called openDirectory (or something called openDirectory for it) and we
know that it hasn't called close (the IndexReader has already held up to
an ensureOpen call on itself in that stack trace). Something else must
have closed it. I can't find this happening. Nothing else calls close on
Directory unless it called openDirectory (that I can find using all of
Eclipses magical goodness).
So how does the refcount on the Directory hit 0? I can't find or duplicate
yet...
Trunk may actually still hide the issue (possibly), but something really
funky seems to have gone on and I can't find it yet. Do you have any
custom code interacting with solr?
- Mark
William Pierce wrote:
Mark,
Thanks for your help and lucid exposition of what may be going on....I
will await to hear from you before I plough ahead with the latest trunk
bits....
Best regards,
-- Bill
--------------------------------------------------
From: "Mark Miller" <[EMAIL PROTECTED]>
Sent: Saturday, November 15, 2008 7:29 AM
To: <solr-user@lucene.apache.org>
Subject: Re: Fatal exception in solr 1.3+ replication
Hang on. The more I look at this, the more I am thinking that was not
the problem. Directories are pretty strictly managed by Lucene at the
moment, and it should actually be pretty difficult to have one closed
out from under you. They are singletons and reference counted. The
IndexReader would have to have been closed if it was responsible for
closing the Directory, and in that case, we would not be trying to
reopen it. The Searcher we get the IndexReader from has been inc ref'd
to ensure it won't close. All I can think is that something is
grabbing/stealing a Directory that didn't directly ask for it from
FSDirectory.getDirectory, and is then closing it. I'm trying to hunt
where that could be happening now. Hope to spot something, but appears
pretty mysterious at the moment. I suppose another option is a nasty
little race condition - I've been trying to repeat the error by sending
lots of update/search requests from multiple threads with no luck
though. Looks like I may have to throw in snap puller code (havn't look
too heavily into any of that before).
At worst, if/when a fix is discovered, you will probably be able to
apply just the fix to the revision your working with.
- Mark
William Pierce wrote:
Mark,
Thanks for your response --- I do appreciate all you volunteers working
to provide such a nice system!
Anyway, I will try the trunk bits as you said. The only problem is
that the later the trunk I use from 1.3, the more of post 1.3
capability I get. And I feel somewhat exposed running these bits in our
production environment...
Not sure if there's another approach?
Thanks,
-Bill
--------------------------------------------------
From: "Mark Miller" <[EMAIL PROTECTED]>
Sent: Friday, November 14, 2008 8:43 PM
To: <solr-user@lucene.apache.org>
Subject: Re: Fatal exception in solr 1.3+ replication
Hey William, sorry about the trouble. I have to look at this further,
but I think the issue is fixed if you grab the latest trunk build.
Solr-465 should inadvertently fix things - before that patch, a
deprecated constructor for solrsearcher was being called - this
constructor caused the underlying IndexReader to close its own
Directory, and since IndexReaders are reopened, we don't want that.
Mark Miller wrote:
Looks like there might be an issue with the reopen - I'm not seeing
what it could be offhand though. Have to find what could be closing a
Directory unexpectedly...I'll try to take a further look over the
weekend.
- Mark
William Pierce wrote:
Folks:
I am using the nightly build of 1.3 as of Oct 23 so as to use the
replication handler. I am running on windows 2003 server with
tomcat 6.0.14. Everything was running fine until I noticed that
certain updated records were not showing up on the slave. Further
investigation showed me that the failures have indeed been occurring
since early this morning with a fatal exception....here is a segment
of the tomcat log:
INFO: Total time taken for download : 0 secs
Nov 14, 2008 5:34:24 AM org.apache.solr.handler.SnapPuller
fetchLatestIndex
INFO: Conf files are not downloaded or are in sync
Nov 14, 2008 5:34:24 AM
org.apache.solr.update.DirectUpdateHandler2 commit
INFO: start
commit(optimize=false,waitFlush=true,waitSearcher=true)
Nov 14, 2008 5:34:24 AM org.apache.solr.handler.ReplicationHandler
doSnapPull
SEVERE: SnapPull failed org.apache.solr.common.SolrException:
Snappull failed : at
org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:278)
at
org.apache.solr.handler.ReplicationHandler.doSnapPull(ReplicationHandler.java:208)
at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:121)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at
java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
at
java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
at java.lang.Thread.run(Thread.java:619)
Caused by: java.lang.RuntimeException:
org.apache.lucene.store.AlreadyClosedException: this Directory is
closed
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1037)
at
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:350)
at
org.apache.solr.handler.SnapPuller.doCommit(SnapPuller.java:353)
at
org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:265)
... 11 more
Caused by: org.apache.lucene.store.AlreadyClosedException: this
Directory is closed
at
org.apache.lucene.store.Directory.ensureOpen(Directory.java:220)
at org.apache.lucene.store.FSDirectory.list(FSDirectory.java:320)
at
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:533)
at
org.apache.lucene.index.SegmentInfos.readCurrentVersion(SegmentInfos.java:366)
at
org.apache.lucene.index.DirectoryIndexReader.isCurrent(DirectoryIndexReader.java:188)
at
org.apache.lucene.index.DirectoryIndexReader.reopen(DirectoryIndexReader.java:124)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1016)
... 14 more
Nov 14, 2008 5:38:52 AM
org.apache.solr.update.DirectUpdateHandler2 commit
Any ideas, anyone?
-- Bill