[jira] [Commented] (GEODE-1672) When amount of overflowed persisted data exceeds heap size startup may run out of memory

ASF subversion and git services (JIRA) Wed, 01 Feb 2017 11:25:57 -0800

    [ 
https://issues.apache.org/jira/browse/GEODE-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15848824#comment-15848824
 ]


ASF subversion and git services commented on GEODE-1672:
--------------------------------------------------------

Commit e606f3e6ec0828f5fc30e20a9dbdf3aa8c3c8620 in geode's branch 
refs/heads/develop from [~agingade]
[ https://git-wip-us.apache.org/repos/asf?p=geode.git;h=e606f3e ]

GEODE-1672: Disabled recovering values for LRU region during startup.

When recovering persistent files, system stores the values in temp
maps (for regions) using a background thread, as these maps are not
actual regions, the temp-regions are  not considered/included for
LRU eviction, which causes the system to run OOM.

The problem is fixed by skipping recovering  values for LRU regions.

A new system property ""disk.recoverLruValues" is added to support
reading values for lru regions.


> When amount of overflowed persisted data exceeds heap size startup may run 
> out of memory
> ----------------------------------------------------------------------------------------
>
>                 Key: GEODE-1672
>                 URL: https://issues.apache.org/jira/browse/GEODE-1672
>             Project: Geode
>          Issue Type: Bug
>          Components: persistence
>            Reporter: Darrel Schneider
>
> Basically, when the amount of data overflowed approaches the heap size, ,such 
> that the total amount of data is very close to or actually surpasses your 
> total tenured heap, it is possible that you will not be able to restart.
> The algorithm during recovery of oplogs/buckets is such that we don't "evict" 
> in the normal sense as data fills the heap during early stages of recovery 
> prior to creating the regions. When the data is first created in the heap, 
> it's not yet official in the region.
> At any rate, if during this early phase of recovery, or during subsequent 
> phase where eviction is working as usual, it is possible that the total data 
> or an early imbalance of buckets prior to the opportunity to rebalance causes 
> us to surpass the critical threshold which will kill us before successful 
> startup.
> To reproduce, you could have 1 region with tons of data that evicts and 
> overflows with persistence. Call it R1. Then another region with persistence 
> that does not evict. Call it R2.
> List R1 fist in the cache.xml file. Start running the system and add data 
> over time until you have overflowed tons of data approaching the heap size in 
> the evicted region, and also have enough data in the R2 region.
> Once you fill these regions with enough data and have overflowed enough to 
> disk and persisted the other region, then shutdown, and then attempt to 
> restart. If you put enough data in, you will hit the critical threshold 
> before being able to complete startup.
> You can work around this issue by configuring geode to not recovery values by 
> setting this system property: -Dgemfire.disk.recoverValues=false
> Values will not be faulted into memory until a read operation is done on that 
> value's key.
> If you have regions that do not use overflow and some that do then another 
> work around is the create the regions that do not use overflow first. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (GEODE-1672) When amount of overflowed persisted data exceeds heap size startup may run out of memory

Reply via email to