: 1) We found the indexing speed starts dipping once the index grow to a
: certain size - in our case around 50G. We don't optimize, but we have
: to maintain a consistent index speed. The only way we could do that
: was keep creating new cores (on the same box, though we do use

Hmmm... it seems like ConcurrentMergeScheduler should make it possible to 
maintain semi-constant indexing speed by doing merges in background 
threads ... the only other issue would be making sure that an individual 
segment never got too big ... but that seems like it should be managable 
with the config options 

(i'm just hypothisizing, i don't normally worry about indexes of this 
size, and when i do i'm not incrementally adding to them as time goes one 
... i guess what i'm asking is if you guys ever looked into these ideas 
and dissmissed them for some reason)

: 2) Be able to drop the whole core for pruning purposes. We didn't want

that makes a lot of sense ... removing older cores is on of the only 
reaosns i could think of for this model to really make a lot of sense for 
performance reasons.

: > One problem is the IT logistics of handling the file set. At 200 million
: > records you have at least 20G of data in one Lucene index. It takes hours to
: > optimize this, and 10s of minutes to copy the optimized index around to
: > query servers.

i get that full optimizes become ridiculous at that point, but you could 
still do partial optimizes ... and isn't the total disk space with this 
strategy still the same?  Aren't you still ultimately copying the same 
amout of data arround?



-Hoss

Reply via email to