[ 
https://issues.apache.org/jira/browse/HBASE-28923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-28923:
-----------------------------------
    Labels: pull-request-available  (was: )

> Prioritize "orphan" blocks for eviction inside BucketCache.freespace
> --------------------------------------------------------------------
>
>                 Key: HBASE-28923
>                 URL: https://issues.apache.org/jira/browse/HBASE-28923
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Wellington Chevreuil
>            Assignee: Wellington Chevreuil
>            Priority: Major
>              Labels: pull-request-available
>
> The Persistent Cache feature brought the ability to recover the cache in the 
> event of a restart or crash. Under certain conditions, the cache recovery can 
> lead to _orphan_ blocks hanging on the cache, causing unnecessary extra cache 
> usage.
> For example, when a region server crashes or restarts, the original regions 
> on this region server would be immediately reassigned on the remaining 
> servers. Once the crashed/restarted server comes back online, persistent 
> cache will recover the cache state prior to the crash/restart, which would 
> contain the blocks from the regions it was holding prior to the incident. 
> This can lead to the _orphan_ blocks situation described above in the 
> following conditions:
>  * If balancer is off, or cache aware balancer fails to move back the very 
> same regions from prior the crash;
>  * If compaction completes for the regions on the temporary servers;
> Also, with the default evictsOnClose is set to false, any region move would 
> leave "orphans" blocks behind. 
> This proposes additional logic for identifying blocks not belonging to any 
> files from current online regions inside BucketCache.freeSpace method.
> This would need to modify both BlockCacheFactory and BucketCache to pass 
> along the map of online regions kept by HRegionServer.
> Inside BucketCache.freeSpace method, when iterating through the backingMap 
> and before separating the entries between the different eviction priority 
> groups, we can check if the given entry belongs to a block from any of the 
> online regions files, using BucketCache.evictBucketEntryIfNoRpcReferenced 
> method to remove it if its file is not found on any of the online regions.
> An additional configurable grace period should be added, to consider only 
> blocks cached before this grace period as potentially orphans. This is to 
> avoid evicting blocks from currently being written files by 
> flushes/compactions, when “caching on write/caching on compaction” is enabled.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to