justinmclean opened a new issue, #10326:
URL: https://github.com/apache/gravitino/issues/10326

   ### What would you like to be improved?
   
   When datasetCacheSize > 0, appendStatisticsImpl commits and then does 
cache.put(tableId, newDataset). Replacing an existing cache value does not 
trigger the current close path, because the cache is wired with 
evictionListener, while replacement is a removal event. The old Dataset can 
remain unclosed, causing leaked native/file handles during repeated statistics 
updates. Default config (datasetCacheSize=0) avoids this path, but cached 
deployments are affected.
   
   ### How should we improve?
   
   Ensure replaced cache entries are closed on any removal, not just eviction. 
Update cache wiring to close Dataset via a removal callback that runs for 
replacement/removal/eviction events. Keep invalidateAll() on shutdown so cached 
datasets are closed at storage close.
   
   Here's a unit test to help:
   ```
     @Test
     public void testDatasetCacheReplacementClosesOldDataset() throws Exception 
{
       String location = 
Files.createTempDirectory("lance_stats_cache_replace").toString();
   
       Map<String, String> properties = Maps.newHashMap();
       properties.put("location", location);
       properties.put("datasetCacheSize", "10");
   
       LancePartitionStatisticStorage storage = new 
LancePartitionStatisticStorage(properties);
       try {
         Cache<Long, Dataset> cache = storage.getDatasetCache();
         Dataset oldDataset = mock(Dataset.class);
         Dataset newDataset = mock(Dataset.class);
   
         cache.put(1L, oldDataset);
         cache.put(1L, newDataset);
   
         verify(oldDataset, times(1)).close();
       } finally {
         FileUtils.deleteDirectory(new File(location));
         storage.close();
       }
     }
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to