On 11/2/2018 5:00 PM, Wei wrote:
After a recent schema change,  it takes almost 40 minutes to optimize the
index.  The schema change is to enable docValues for all sort/facet fields,
which increase the index size from 12G to 14G. Before the change it only
takes 5 minutes to do the optimization.

An optimize is not just a straight data copy.  Lucene is actually completely recalculating the index data structures.  It will never proceed at the full data rate your disks are capable of achieving.

I do not know how docValues actually work during a segment merge, but given exactly how the info relates to the inverted index, it's probably even more complicated than the rest of the data structures in a Lucene index.

On one of the systems I used to manage, back in March of 2017, I was seeing a 50GB index take 1.73 hours to optimize.  I do not recall whether I had docValues at that point, but I probably did.

http://lucene.472066.n3.nabble.com/What-is-the-bottleneck-for-an-optimise-operation-tt4323039.html#a4323140

There's not much you can do to make this go faster. Putting massively faster CPUs in the machine MIGHT make a difference, but it probably wouldn't be a BIG difference.  I'm talking about clock speed, not core count.

Thanks,
Shawn

Reply via email to