On 2/8/2014 11:02 AM, Roman Chyla wrote: > I would be curious what the cause is. Samarth says that it worked for over > a year /and supposedly docs were being added all the time/. Did the index > grew considerably in the last period? Perhaps he could attach visualvm > while it is in the 'black hole' state to see what is actually going on. I > don't know if the instance is used also for searching, but if its only > indexing, maybe just shorter commit intervals would alleviate the problem. > To add context, our indexer is configured with 16gb heap, on machine with > 64gb ram, but busy one, so sometimes there is no cache to spare for os. The > index is 300gb (out of which 140gb stored values), and it is working just > 'fine' - 30doc/s on average, but our docs are large /0.5mb on avg/ and > fetched from two databases, so the slowness is outside solr. I didnt see > big improvements with bigger heap, but I don't remember exact numbers. This > is solr4.
For this discussion, refer to this image, or the Google Books link where I originally found it: https://dl.dropboxusercontent.com/u/97770508/performance-dropoff-graph.png http://books.google.com/books?id=dUiNGYCiWg0C&pg=PA33#v=onepage&q&f=false Computer systems have had a long history of performance curves like this. Everything goes really well, possibly for a really long time, until you cross some threshold where a resource cannot keep up with the demands being placed on it. That threshold is usually something you can't calculate in advance. Once it is crossed, even by a tiny amount, performance drops VERY quickly. I do recommend that people closely analyze their GC characteristics, but jconsole, jvisualvm, and other tools like that are actually not very good at this task. You can only get summary info -- how many GCs occurred and total amount of time spent doing GC, often with a useless granularity -- jconsole reports the time in minutes on a system that has been running for any length of time. I *was* having occasional super-long GC pauses (15 seconds or more), but I did not know it, even though I had religiously looked at GC info in jconsole and jstat. I discovered the problem indirectly, and had to find additional tools to quantify it. After discovering it, I tuned my garbage collection and have not had the problem since. If you have detailed GC logs enabled, this is a good free tool for offline analysis: https://code.google.com/p/gclogviewer/ I have also had good results with this free tool, but it requires a little more work to set up: http://www.azulsystems.com/jHiccup Azul Systems has an alternate Java implementation for Linux that virtually eliminates GC pauses, but it isn't free. I do not have any information about how much it costs. We found our own solution, but for those who can throw money at the problem, I've heard good things about it. Thanks, Shawn