On 11/1/06, Mike Klaas <[EMAIL PROTECTED]> wrote:
DUH2.doDeletions() would also highly benefit from sorting the id terms before looking them up in these types of cases (as it would trigger optimizations in lucene as well as being kinder to the os' read-ahead buffers).
Hmmm, good point. I wonder how simply using a TreeMap instead of a HashMap would work.
> If you have a multi-CPU server, you could increase indexing > performance by using a multithreaded client to keep all the CPUs on > the server busy. I thought so, too, but it turns out that there isn't a huge amount of concurrent updating that can occur, if I am reading the code correctly. DUH2.addDoc() calls exactly one of addConditionally, overwriteBoth, or allowDups, each of which add the document in a synchronized(this) block.
Good catch. And with the way that deletes are deferred, moving the add outside of the sync block should work OK I think... then the analysis if documents can be done in parallel. Hmmm, but it may not work well in a mixed-overwriting environment. Thread 1 overwrites doc 100, Thread 2 adds doc 100 (allowing duplicates). With add synchronization the index has two possible states: Index contains doc_from_thread1 OR index contains both docs Without sync around the adds, an additional possible state is added: Index contains doc_from_thread2 Even though synchronized behavior != unsynchronized behavior, this is only a problem if someone actually desires to mix overwriting & non-overwriting on the same document ids, and is OK with the two possible states in the synchronized case. I'm tempted to say "mixing overwriting & non-overwriting adds for the same documents has undefined behavior". Thoughts? -Yonik