On 11/27/2012 11:39 AM, Jack Krupansky wrote:
So, if I understand your scenario correctly, you are doing a lot of
deletes, but since they are occurring against "cold" data, there is
isn't usually much if any query traffic for that old/cold data.
In short, it sounds like the reason you are optimizing is to keep the
memory footprint from growing in a very memory-limited environment.
It also looks like you have frequent garbage collections.
With 64GB of RAM, I'm not sure I'd classify my situation as
memory-limited. It's true that I don't have enough RAM to cache all my
index data, but over 8GB of each 22GB index is stored fields (.fdt
files), so I have the important bits. I'm sure I can increase my heap
size without drastically affecting performance, but so far I have not
needed to. If we start using more Solr functionality like facets, I'm
sure I will have to increase the heap.
This is a distributed index, every query hits every shard.A large chunk
of the data that gets returned comes from the hot shard, but users do
page down into old results fairly often. Only data added the last 3.5
to 7 days lives in the hot shard.
As far as frequent garbage collections, I would agree with you if I were
restarting Solr often. This JVM has nearly 22 days of uptime, so on
average there is about 5 minutes between each GC:
https://dl.dropbox.com/u/97770508/solr-jconsole-summary.png
When/if a configuration option becomes available so I can do a commit
that expunges deletes even when there are only a few deleted documents,
or if I can figure out how to add that option myself, I will be able to
eliminate full optimization entirely.
Thanks,
Shawn