On Tue, Jan 25, 2011 at 2:06 PM, Markus Jelsma <markus.jel...@openindex.io>wrote:
> On Tuesday 25 January 2011 11:54:55 Martin Grotzke wrote: > > Hi, > > > > recently we're experiencing OOMEs (GC overhead limit exceeded) in our > > searches. Therefore I want to get some clarification on heap and cache > > configuration. > > > > This is the situation: > > - Solr 1.4.1 running on tomcat 6, Sun JVM 1.6.0_13 64bit > > - JVM Heap Params: -Xmx8G -XX:MaxPermSize=256m -XX:NewSize=2G > > -XX:MaxNewSize=2G -XX:SurvivorRatio=6 -XX:+UseParallelOldGC > > -XX:+UseParallelGC > > Consider switching to HotSpot JVM, use the -server as the first switch. The jvm options I mentioned were not all, we're running the jvm with -server (of course). > > > - The machine has 32 GB RAM > > - Currently there are 4 processors/cores in the machine, this shall be > > changed to 2 cores in the future. > > - The index size in the filesystem is ~9.5 GB > > - The index contains ~ 5.500.000 documents > > - 1.500.000 of those docs are available for searches/queries, the rest > are > > inactive docs that are excluded from searches (via a flag/field), but > > they're still stored in the index as need to be available by id (solr is > > the main document store in this app) > > How do you exclude them? It should use filter queries. The docs are indexed with a field "findable" on which we do a filter query. > I also remember (but i > just cannot find it back so please correct my if i'm wrong) that in 1.4.x > sorting is done before filtering. It should be an improvement if filtering > is > done before sorting. > Hmm, I cannot imagine a case where it makes sense to sort before filtering. Can't believe that solr does it like this. Can anyone shed a light on this? > If you use sorting, it takes up a huge amount of RAM if filtering is not > done > first. > > > - Caches are configured with a big size (the idea was to prevent > filesystem > > access / disk i/o as much as possible): > > There is only disk I/O if the kernel can't keep the index (or parts) in its > page cache. > Yes, I'll keep an eye on disk I/O. > > - filterCache (solr.LRUCache): size=200000, initialSize=30000, > > autowarmCount=1000, actual size =~ 60.000, hitratio =~ 0.99 > > - documentCache (solr.LRUCache): size=200000, initialSize=100000, > > autowarmCount=0, actual size =~ 160.000 - 190.000, hitratio =~ 0.74 > > - queryResultCache (solr.LRUCache): size=200000, initialSize=30000, > > autowarmCount=10000, actual size =~ 10.000 - 60.000, hitratio =~ 0.71 > > You should decrease the initialSize values. But your hitratio's seem very > nice. > Does the initialSize have a real impact? According to http://wiki.apache.org/solr/SolrCaching#initialSize it's the initial size of the HashMap backing the cache. What would you say are reasonable values for size/initialSize/autowarmCount? Cheers, Martin