These are the reasons why we are thinking on splitting and index via multi-core:
First of all all, we have an index of news which size is about 9G. As
we will keep aggregating news forever and ever and let users do free
text search on our system, we think that it will be easier for IT
crowd to man
: 1) We found the indexing speed starts dipping once the index grow to a
: certain size - in our case around 50G. We don't optimize, but we have
: to maintain a consistent index speed. The only way we could do that
: was keep creating new cores (on the same box, though we do use
Hmmm... it seems
There were two main reasons we went with multi-core solution,
1) We found the indexing speed starts dipping once the index grow to a
certain size - in our case around 50G. We don't optimize, but we have
to maintain a consistent index speed. The only way we could do that
was keep creating new cores
One problem is the IT logistics of handling the file set. At 200 million
records you have at least 20G of data in one Lucene index. It takes hours to
optimize this, and 10s of minutes to copy the optimized index around to
query servers.
Another problem is that indexing speed drops off after the ind
: We're doing similar thing with multi-core - when a core reaches
: capacity (in our case 200 million records) we start a new core. We are
: doing this via web service call (Create web service),
this whole thread perplexes me ... while i can understand not wanting to
let an index grow without
Lici,
We're doing similar thing with multi-core - when a core reaches
capacity (in our case 200 million records) we start a new core. We are
doing this via web service call (Create web service),
http://wiki.apache.org/solr/CoreAdmin
This is all done in java code - before writing we check the