On 5/1/2013 8:42 AM, Annette Newton wrote:
It was a single delete with a date range query.  We have 8 machines each
with 35GB memory, 10GB is allocated to the JVM.  Garbage collection has
always been a problem for us with the heap not clearing on Full garbage
collection.  I don't know what is being held in memory and refuses to be
collected.

I have seen your java heap configuration on previous posts and it's very
like ours except that we are not currently using LargePages (I don't know
how much difference that has made to your memory usage).

We have tried various configurations around Java including the G1 collector
(which was awful) but all settings seem to leave the old generation at
least 50% full, so it quickly fills up again.

-Xms10240M -Xmx10240M -XX:+UseConcMarkSweepGC -XX:+UseParNewGC
-XX:+CMSParallelRemarkEnabled -XX:NewRatio=2 -XX:+CMSScavengeBeforeRemark
-XX:CMSWaitDuration=5000  -XX:+CMSClassUnloadingEnabled
-XX:CMSInitiatingOccupancyFraction=80 -XX:+UseCMSInitiatingOccupancyOnly

If I could only figure out what keeps the heap to the current level I feel
we would be in a better place with solr.

With a single delete request, it was probably the commit that was very slow and caused the problem, not the delete itself. This has been my experience with my large indexes.

My attempts with the G1 collector were similarly awful. The idea seems sound on paper, but Oracle needs to do some work in making it better for large heaps. Because my GC tuning was not very disciplined, I do not know how much impact UseLargePages is having.

Your overall RAM allocation should be good. If these machines aren't being used for other software, then you have 24-25GB of memory available for caching your index, which should be very good with 26GB of index for that machine.

Looking over your message history, I see that you're using Amazon EC2. Solr performs much better on bare metal, although the EC2 instance you're using is probably very good.

SolrCloud is optimized for machines that are on the same Ethernet LAN. Communication between EC2 VMs (especially if they are not located in nearby data centers) will have some latency and a potential for dropped packets. I'm going to proceed with the idea that EC2 and virtualization are not the problems here.

I'm not really surprised to hear that with an index of your size that so much of a 10GB heap is retained. There may be things that could reduce your memory usage, so could you share your solrconfig.xml and schema.xml with a paste site that does XML highlighting (pastie.org being a good example), and give us an idea of how often you update and commit? Feel free to search/replace sensitive information, as long that work is consistent and you don't entirely remove it. Armed with that information, we can have a discussion about your needs and how to achieve them.

Do you know how long cache autowarming is taking? The cache statistics should tell you how long it took on the last commit.

Some examples of typical real-world queries would be helpful too. Examples should be relatively complex for your setup, but not worst-case. An example query for my setup that meets this requirement would probably be 4-10KB in size ... some of them are 20KB!

Not really related - a question about one of your old messages that never seemed to get resolved: Are you still seeing a lot of CLOSE_WAIT connections in your TCP table? A later message from you mentioned 4.2.1, so I'm wondering specifically about that version.

Thanks,
Shawn

Reply via email to