On 11/27/2012 11:39 AM, Jack Krupansky wrote:
So, if I understand your scenario correctly, you are doing a lot of deletes, but since they are occurring against "cold" data, there is isn't usually much if any query traffic for that old/cold data.

In short, it sounds like the reason you are optimizing is to keep the memory footprint from growing in a very memory-limited environment.

It also looks like you have frequent garbage collections.

With 64GB of RAM, I'm not sure I'd classify my situation as memory-limited. It's true that I don't have enough RAM to cache all my index data, but over 8GB of each 22GB index is stored fields (.fdt files), so I have the important bits. I'm sure I can increase my heap size without drastically affecting performance, but so far I have not needed to. If we start using more Solr functionality like facets, I'm sure I will have to increase the heap.

This is a distributed index, every query hits every shard.A large chunk of the data that gets returned comes from the hot shard, but users do page down into old results fairly often. Only data added the last 3.5 to 7 days lives in the hot shard.

As far as frequent garbage collections, I would agree with you if I were restarting Solr often. This JVM has nearly 22 days of uptime, so on average there is about 5 minutes between each GC:

https://dl.dropbox.com/u/97770508/solr-jconsole-summary.png

When/if a configuration option becomes available so I can do a commit that expunges deletes even when there are only a few deleted documents, or if I can figure out how to add that option myself, I will be able to eliminate full optimization entirely.

Thanks,
Shawn

Reply via email to