On 11/2/06, Mike Klaas <[EMAIL PROTECTED]> wrote:
The one thing I'm worried about is closing the writer while documents are being added to it. IndexWriter is nominally thread-safe, but I'm not sure what happens to documents that are being added at the time. Looking at IndexWriter.java, it seems like if addDocument() is entered but hasn't reached the synchronized block, then close() is called, the document could be lost or an exception raised.
This seems harder to address in "user code" and still maintain parallelism. Perhaps a Lucene patch would be more appropriate? Perhaps IndexWriter should have a close flag, and addDocument should return a boolean indicating if the document was added or not. Then we could move addDocument() outside the sync block, and put a big do while(!addDocument()) loop around the whole thing. There is still another case to consider: if a commit happens between adding the id to the pset and adding the document to the index, and the add succeeds, the id will no longer be in the pset so we will end up with a duplicate after the next commit.
I was going to try to put in some basic autoCommit logic while I was mucking about here. One question: did you intend for maxCommitTime to trigger deterministically (regardless of any events occurring or not)?
I hadn't thought through the whole thing, but it seems like it should only trigger if it would make a difference.
I had in mind checking these constraints only when documents are added, but this could result in maxCommitTime elapsing without a commit.
If there is nothing to commit, that should be fine. I think the type of guarantee we should make is that if you add a document, it will be committed within a certain period of time (leaving out variances for autowarming time, etc). -Yonik