On 11/2/06, Yonik Seeley <[EMAIL PROTECTED]> wrote:
On 11/1/06, Mike Klaas <[EMAIL PROTECTED]> wrote:
> DUH2.doDeletions() would also highly benefit from sorting the id terms
> before looking them up in these types of cases (as it would trigger
> optimizations in lucene as well as being kinder to the os' read-ahead
> buffers).
Hmmm, good point. I wonder how simply using a TreeMap instead of a
HashMap would work.
Definitely.
> I thought so, too, but it turns out that there isn't a huge amount of
> concurrent updating that can occur, if I am reading the code
> correctly. DUH2.addDoc() calls exactly one of addConditionally,
> overwriteBoth, or allowDups, each of which add the document in a
> synchronized(this) block.
Good catch.
And with the way that deletes are deferred, moving the add outside of
the sync block should work OK I think... then the analysis if
documents can be done in parallel.
The one thing I'm worried about is closing the writer while documents
are being added to it. IndexWriter is nominally thread-safe, but I'm
not sure what happens to documents that are being added at the time.
Looking at IndexWriter.java, it seems like if addDocument() is entered
but hasn't reached the synchronized block, then close() is called, the
document could be lost or an exception raised.
<>
I'm tempted to say "mixing overwriting & non-overwriting adds for the
same documents has undefined behavior". Thoughts?
I believe that is reasonable.
I was going to try to put in some basic autoCommit logic while I was
mucking about here. One question: did you intend for maxCommitTime to
trigger deterministically (regardless of any events occurring or not)?
I had in mind checking these constraints only when documents are
added, but this could result in maxCommitTime elapsing without a
commit.
regards,
-Mike