Thanks for the reply Shawn.

Currently, my heap allocation to each Solr instance is 22GB.
Is that big enough?

Regards,
Edwin


On 13 October 2016 at 23:56, Shawn Heisey <apa...@elyograg.org> wrote:

> On 10/13/2016 9:20 AM, Zheng Lin Edwin Yeo wrote:
> > Would like to find out, will the indexing speed in a collection with a
> > very large index size be much slower than one which is still empty or
> > a very small index size? This is assuming that the configurations,
> > indexing code and the files to be indexed are the same. Currently, I
> > have a setup in which the collection is still empty, and I managed to
> > achieve an indexing speed of more than 7GB/hr. I also have another
> > setup in which the collection has an index size of 1.6TB, and when I
> > tried to index new documents to it, the indexing speed is less than
> > 0.7GB/hr.
>
> I have noticed this phenomenon myself.  As the amount of index data
> already present increases, indexing slows down.  Best guess as to the
> cause: more frequent and longer-lasting garbage collections.
>
> Indexing involves a LOT of memory allocation.  Most of the memory chunks
> that get allocated are quickly discarded because they do not need to be
> retained.
>
> If you understand how the Java memory model works, then you know that
> this means there will be a lot of garbage collection.  Each GC will tend
> to take longer if there are a large number of objects allocated that are
> NOT garbage.
>
> When the index is large, Lucene/Solr must allocate and retain a larger
> amount of memory just to ensure that everything works properly.  This
> leaves less free memory, so indexing will cause more frequent garbage
> collections ... and because the amount of retained memory is
> correspondingly larger, each garbage collection will take longer than it
> would with a smaller index.  A ten to one difference in speed does seem
> extreme, though.
>
> You might want to increase the heap allocated to each Solr instance, so
> GC is less frequent.  This can take memory away from the OS disk cache,
> though.  If the amount of OS disk cache drops too low, general
> performance may suffer.
>
> Thanks,
> Shawn
>
>

Reply via email to