Re: Solr 4.2, reindexing, transaction logs, high memory usage

Shawn Heisey Fri, 22 Mar 2013 11:23:05 -0700

On 3/22/2013 9:24 AM, Raghav Karol wrote:

We run this index in 8 solr sharded in 8 solr cores on a single host
an m2.4xlarge EC2 instances. We do not use zookeeper (because of
operational issues on our live indexes) and manage the sharding
ourselves.


For this index we run with -Xmx30G and observe in (jsconsole) that the
solr runs with approximately 25G.
Autocommit kills solr, it sends heap memory usage to max and kills
solr. The reason appears to be committing to all cores in parallel.
Disabling autoCommit and  running a loop like
     while(true); do for i in $(seq 0 7); do curl -s
"http://localhost:8085/solr/core${i}/update?commit=true&wt=json"; done

produces:

{"responseHeader":{"status":0,"QTime":8297}}
{"responseHeader":{"status":0,"QTime":8358}}
{"responseHeader":{"status":0,"QTime":9552}}
{"responseHeader":{"status":0,"QTime":8368}}
{"responseHeader":{"status":0,"QTime":9296}}
{"responseHeader":{"status":0,"QTime":8527}}
{"responseHeader":{"status":0,"QTime":9458}}
{"responseHeader":{"status":0,"QTime":8929}}

8 seconds to process a commit where with no changes to the index!?!

If this index is actively processing queries, then what you areexperiencing here is probably cache warming - Solr looks at the entriesin each of its caches and uses those entries to run queries against thenew index to pre-populate the new caches. The number of entries thatare used for warming queries will be controlled by the autoWarmCountvalue on the cache definition.

Why does solr need such a large heap space for this index (it dies
with 10G and 20G and is constant at 28G in jconsole)?
Why does running a commits in parallel via autoCommit or the command
exhaust the memory?
Are we using dynamic fields incorrectly?

When you run a commit, Solr fires up a new index searcher object,complete with caches, which will then be autowarmed from the old cachesas described above. Until the new object is fully warmed, the oldsearcher will exist and will continue to serve queries. If you issueanother commit while a new searcher is already warming, then *another*searcher is likely to get fired up as well, depending on the value ofmaxWarmingSearchers in your solrconfig.xml file.

The amount of memory required by a searcher can be very high, due inpart to caches, especially the FieldCache, which is used internally byLucene and is not configurable like the others. If you have 8 cores andyou run commits on them in parallel that take several seconds, then forseveral seconds you will have at least sixteen searchers running. Ifyour maxWarmingSearchers value is higher than 1, you might end up witheven more searchers running at the same time. This is likely where yourmemory is going.

By lowering the autoWarmCount values on your caches, you can reduce theamount of time it takes to do a commit. You should also keep track ofwhether anything has actually changed on each core and don't issue acommit when nothing has changed. Also, it would be a good idea tostagger the commits so that all your cores are not committing at thesame time.


Thanks,
Shawn

Re: Solr 4.2, reindexing, transaction logs, high memory usage

Reply via email to