Option 1 is not a bad idea. Another thought might be to not start
asynchronous value recovery until all of the regions are created. I think
right now we launch a task to read all of the oplogs and recover values as
soon as the disk store is created. Maybe we could defer that until after
the last region in that disk store is actually created. Option 2 seems
pretty complicated for a small window of time.

-Dan

On Tue, Jan 17, 2017 at 2:41 PM, Anilkumar Gingade <aging...@pivotal.io>
wrote:

> Hi Geode Devs,
>
> We are working on ticket GEODE-1672, related to out of memory during
> recovery with overflow regions (heap LRU configured).
>
> https://issues.apache.org/jira/browse/GEODE-1672
>
> When recovering the persistent files, GEODE stores the values into temp
> maps (for regions) using a background thread, as these maps are not actual
> regions,  these are not considered/included for LRU eviction, which causes
> the system to run OOM.
>
> We are thinking about following approaches to address this issue...Let us
> know if you have any comments/suggestion about the solutions.
>
> 1. Skip recovering the regions marked with LRU eviction.
> - This keeps the code changes to minimal.
> - Accessing the most recently used values first time, will be expensive.
> But this is true even if the values are recovered, as Geode doesn't
> guarantee the recently/most used values will be in memory after recovery.
> - This may impact the use-cases where regions are set with LRU eviction,
> even though there is no  memory pressure (system configured to handle
> unexpected events)
>
> 2. Include temp maps (these are AbstractRegionMap) for eviction during
> recovery.
> - May involve lots of code change. The size estimation code in bucket
> regions need to be moved to AbstractRegionMap.
> - Need to handle the rate of recovery thread to throttle based on the
> eviction rate, which could impact the recovery of regions without eviction.
> We can think of overriding the default eviction rate during recovery...
> - The regions will be in the similar state (number of entries), when system
> is recovered.
>
> 3. Stop recovery when system hits critical-heap-memory
> - This requires setting/recommending critical-heap-percentage. Throwing
> LowMemoryException during recovery, if system is low on memory.
> - This may impact the first read on the region whose values are not
> recovered.
>
> Thanks,
> -Anil.
>

Reply via email to