On Tue, 2015-06-16 at 09:54 -0700, Shenghua(Daniel) Wan wrote:
> Hi, Toke,
> Did you try MapReduce with solr? I think it should be a good fit for your
> use case.
Thanks for the suggestion. Improved logistics, such as starting build of
a new shard while the previous shard is optimizing, would work
Hi, Toke,
Did you try MapReduce with solr? I think it should be a good fit for your
use case.
On Tue, Jun 16, 2015 at 5:02 AM, Toke Eskildsen
wrote:
> Shenghua(Daniel) Wan wrote:
> > Actually, I am currently interested in how to boost merging/optimizing
> > performance of single solr instance.
Shenghua(Daniel) Wan wrote:
> Actually, I am currently interested in how to boost merging/optimizing
> performance of single solr instance.
We have the same challenge (we build static 900GB shards one at a time and the
final optimization takes 8 hours with only 1 CPU core at 100%). I know that
I think your advice on future incremental update is very useful. I will
keep eye on that.
Actually, I am currently interested in how to boost merging/optimizing
performance of single solr instance.
Parallelism at MapReduce level does not help merging/optimizing much,
unless Solr/Lucene internally
Ah, OK. For very slowly changing indexes optimize can makes sense.
Do note, though, that if you incrementally index after the full build, and
especially if you update documents, you're laying a trap for the future. Let's
say you optimize down to a single segment. The default TieredMergePolicy
trie
Hi, Erick,
First thanks for sharing the ideas. I am further giving more context here
accordingly.
1. why optimize? I have done some experiments to compare the query response
time, and there is some difference. In addition, the searcher will be
customer-facing. I think any performance boost will be
The first question is why you're optimizing at all. It's not recommended
unless you can demonstrate that an optimized index is giving you enough
of a performance boost to be worth the effort.
And why are you using embedded solr server? That's kind of unusual
so I wonder if you've gone down a wrong
Hi,
Do you have any suggestions to improve the performance for merging and
optimizing index?
I have been using embedded solr server to merge and optimize the index. I
am looking for the right parameters to tune. My use case have about 300
fields plus 250 copyfields, and moderate doc size (about 65K