Thanks for the reply Shawn. Currently, my heap allocation to each Solr instance is 22GB. Is that big enough?
Regards, Edwin On 13 October 2016 at 23:56, Shawn Heisey <apa...@elyograg.org> wrote: > On 10/13/2016 9:20 AM, Zheng Lin Edwin Yeo wrote: > > Would like to find out, will the indexing speed in a collection with a > > very large index size be much slower than one which is still empty or > > a very small index size? This is assuming that the configurations, > > indexing code and the files to be indexed are the same. Currently, I > > have a setup in which the collection is still empty, and I managed to > > achieve an indexing speed of more than 7GB/hr. I also have another > > setup in which the collection has an index size of 1.6TB, and when I > > tried to index new documents to it, the indexing speed is less than > > 0.7GB/hr. > > I have noticed this phenomenon myself. As the amount of index data > already present increases, indexing slows down. Best guess as to the > cause: more frequent and longer-lasting garbage collections. > > Indexing involves a LOT of memory allocation. Most of the memory chunks > that get allocated are quickly discarded because they do not need to be > retained. > > If you understand how the Java memory model works, then you know that > this means there will be a lot of garbage collection. Each GC will tend > to take longer if there are a large number of objects allocated that are > NOT garbage. > > When the index is large, Lucene/Solr must allocate and retain a larger > amount of memory just to ensure that everything works properly. This > leaves less free memory, so indexing will cause more frequent garbage > collections ... and because the amount of retained memory is > correspondingly larger, each garbage collection will take longer than it > would with a smaller index. A ten to one difference in speed does seem > extreme, though. > > You might want to increase the heap allocated to each Solr instance, so > GC is less frequent. This can take memory away from the OS disk cache, > though. If the amount of OS disk cache drops too low, general > performance may suffer. > > Thanks, > Shawn > >