On 5/1/2013 8:42 AM, Annette Newton wrote:
It was a single delete with a date range query. We have 8 machines each
with 35GB memory, 10GB is allocated to the JVM. Garbage collection has
always been a problem for us with the heap not clearing on Full garbage
collection. I don't know what is being held in memory and refuses to be
collected.
I have seen your java heap configuration on previous posts and it's very
like ours except that we are not currently using LargePages (I don't know
how much difference that has made to your memory usage).
We have tried various configurations around Java including the G1 collector
(which was awful) but all settings seem to leave the old generation at
least 50% full, so it quickly fills up again.
-Xms10240M -Xmx10240M -XX:+UseConcMarkSweepGC -XX:+UseParNewGC
-XX:+CMSParallelRemarkEnabled -XX:NewRatio=2 -XX:+CMSScavengeBeforeRemark
-XX:CMSWaitDuration=5000 -XX:+CMSClassUnloadingEnabled
-XX:CMSInitiatingOccupancyFraction=80 -XX:+UseCMSInitiatingOccupancyOnly
If I could only figure out what keeps the heap to the current level I feel
we would be in a better place with solr.
With a single delete request, it was probably the commit that was very
slow and caused the problem, not the delete itself. This has been my
experience with my large indexes.
My attempts with the G1 collector were similarly awful. The idea seems
sound on paper, but Oracle needs to do some work in making it better for
large heaps. Because my GC tuning was not very disciplined, I do not
know how much impact UseLargePages is having.
Your overall RAM allocation should be good. If these machines aren't
being used for other software, then you have 24-25GB of memory available
for caching your index, which should be very good with 26GB of index for
that machine.
Looking over your message history, I see that you're using Amazon EC2.
Solr performs much better on bare metal, although the EC2 instance
you're using is probably very good.
SolrCloud is optimized for machines that are on the same Ethernet LAN.
Communication between EC2 VMs (especially if they are not located in
nearby data centers) will have some latency and a potential for dropped
packets. I'm going to proceed with the idea that EC2 and virtualization
are not the problems here.
I'm not really surprised to hear that with an index of your size that so
much of a 10GB heap is retained. There may be things that could reduce
your memory usage, so could you share your solrconfig.xml and schema.xml
with a paste site that does XML highlighting (pastie.org being a good
example), and give us an idea of how often you update and commit? Feel
free to search/replace sensitive information, as long that work is
consistent and you don't entirely remove it. Armed with that
information, we can have a discussion about your needs and how to
achieve them.
Do you know how long cache autowarming is taking? The cache statistics
should tell you how long it took on the last commit.
Some examples of typical real-world queries would be helpful too.
Examples should be relatively complex for your setup, but not
worst-case. An example query for my setup that meets this requirement
would probably be 4-10KB in size ... some of them are 20KB!
Not really related - a question about one of your old messages that
never seemed to get resolved: Are you still seeing a lot of CLOSE_WAIT
connections in your TCP table? A later message from you mentioned
4.2.1, so I'm wondering specifically about that version.
Thanks,
Shawn