Github user dschneider-pivotal commented on a diff in the pull request: https://github.com/apache/geode/pull/559#discussion_r120222821 --- Diff: geode-docs/managing/troubleshooting/system_failure_and_recovery.html.md.erb --- @@ -276,8 +276,83 @@ find the reason. Description: -The process discovered that it was not in the distributed system and cannot determine why it was removed. The membership coordinator removed the member after it failed to respond to an internal are you alive message. +The process discovered that it was not in the distributed system and cannot determine why it was +removed. The membership coordinator removed the member after it failed to respond to an internal +are-you-alive message. Response: The operator should examine the locator processes and logs. + +## <a id="restart-failure-persistent-lru" class="no-quick-link"></a> Restart Fails Due To Out-of-Memory Error + +This section describes a restart failure that can occur when the stopped system is one that was configured with persistent regions. Specifically: + +- Some of the regions of the recovering system, when running, were configured as PERSISTENT regions, which means that they save their data to disk. +- At least one of the persistent regions was configured to evict least recently used (LRU) data by overflowing values to disk. + +### How Data is Recovered From Persistent Regions + +Data recovery, upon restart, always recovers keys. You can configure whether and how the system +recovers the values associated with those keys to populate the system cache. + +**Value Recovery** + +- Recovering all values immediately during startup slows the startup time but results in consistent +read performance after the startup on a "hot" cache. + +- Recovering no values means quicker startup but a "cold" cache, so the first retrieval of each value will read from disk. + +- Retrieving values asynchronously in a background thread allows a relatively quick startup on a "warm" cache +that will eventually recover every value. + +**Retrieve or Ignore LRU values** + +When a system with persistent LRU regions shuts down, the system does not record which of the values +were recently used. On subsequent startup, if values are recovered into an LRU region they may be +the least recently used instead of the most recently used. Also, if LRU values are recovered on a +heap or an off-heap LRU region, it is possible that the LRU memory limit will be exceeded, resulting +in an `OutOfMemoryException` during recovery. For these reasons, LRU value recovery can be treated +differently than non-LRU values. + +## Default Recovery Behavior for Persistent Regions + +The default behavior is for the system to recover all keys, then asynchronously recover all data +values that were resident, leaving LRU values unrecovered. This default strategy is best for --- End diff -- drop "that were resident"
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---