On 9/22/2016 1:01 PM, vsolakhian wrote:
> Our index is in HDFS, but we did not change any configuration after we
> deleted 35% of records and optimized.
>
> The relatively slow commit (soft commit and warming up took 1.5 minutes) is
> OK for our use case (adding hundreds of thousands and even millions of
> records and then committing).
>
> The question is why it takes much longer after optimization, when disk
> caches, network and other configuration remained the same and the index is
> smaller?

When you optimize an index down to one segment, you are reading one
entire copy of the index and creating a second copy of the index.  This
is going to greatly affect the data that is in the disk cache.

Presumably you do not have enough caching memory to hold anywhere near
the entire 300GB index.  Memory sizes that large are possible, but not
common.  With HDFS, I think the amount of memory used for caching is
configurable.  I do not know if both HDFS clients and servers can do
caching, or if that's just a server-side option.  With a 300GB index,
150 to 250GB of memory should be available for caching if you want to
have stellar performance.  If you can get the entire 300GB to fit, then
you'd nearly be guaranteed good performance.

Assuming I'm right about the amount of caching memory available relative
to the index size, when the optimize is finished, chances are very good
that the particular data sitting in the disk cache is completely useless
for queries, so the first few warming and user queries will need to
actually read the *disk*, and put different data in the cache.  When
enough queries have been processed, eventually the disk cache will be
populated with enough relevant data that subsequent queries will be fast.

If there are other programs or Solr indexes competing for the same
caching memory, then the problem might be even worse.

You might want to refrain from optimizing indexes this large, at least
on a frequent basis, and just rely on normal index merging to handle
your deletes.

Optimizing is a special case when it comes to cache memory, and for
that, you need even more than in the general case.  There's a special
note about optimizes here:

https://wiki.apache.org/solr/SolrPerformanceProblems#OS_Disk_Cache

Thanks,
Shawn

Reply via email to