On 10/31/06, Yonik Seeley <[EMAIL PROTECTED]> wrote:

Bigger batches before a commit will be more efficient in general...
the only state that Solr keeps around before a commit is a
HashTable<String,Integer> entry per unique id deleted or overwritten.
You might be able to do your entire collection.

Note that _some_ care should be taken here as well.  I recently tried
to commit 3.9m documents in one go to an index that already every
document (thus needing to delete them all) and ended up in a strange
situation where the cpu was spinning for over a day with the java heap
maxed (1.1Gb).  If you attempt less insane feats it will go better.

DUH2.doDeletions() would also highly benefit from sorting the id terms
before looking them up in these types of cases (as it would trigger
optimizations in lucene as well as being kinder to the os' read-ahead
buffers).

If you have a multi-CPU server, you could increase indexing
performance by using a multithreaded client to keep all the CPUs on
the server busy.

I thought so, too, but it turns out that there isn't a huge amount of
concurrent updating that can occur, if I am reading the code
correctly.  DUH2.addDoc() calls exactly one of addConditionally,
overwriteBoth, or allowDups, each of which add the document in a
synchronized(this) block.

This shouldn't be too hard to fix.  I'm going to take a look at doing so.

-Mike

Reply via email to