On 11/2/2018 5:00 PM, Wei wrote:
After a recent schema change, it takes almost 40 minutes to optimize the
index. The schema change is to enable docValues for all sort/facet fields,
which increase the index size from 12G to 14G. Before the change it only
takes 5 minutes to do the optimization.
An optimize is not just a straight data copy. Lucene is actually
completely recalculating the index data structures. It will never
proceed at the full data rate your disks are capable of achieving.
I do not know how docValues actually work during a segment merge, but
given exactly how the info relates to the inverted index, it's probably
even more complicated than the rest of the data structures in a Lucene
index.
On one of the systems I used to manage, back in March of 2017, I was
seeing a 50GB index take 1.73 hours to optimize. I do not recall
whether I had docValues at that point, but I probably did.
http://lucene.472066.n3.nabble.com/What-is-the-bottleneck-for-an-optimise-operation-tt4323039.html#a4323140
There's not much you can do to make this go faster. Putting massively
faster CPUs in the machine MIGHT make a difference, but it probably
wouldn't be a BIG difference. I'm talking about clock speed, not core
count.
Thanks,
Shawn