justinmclean opened a new issue, #10326:
URL: https://github.com/apache/gravitino/issues/10326
### What would you like to be improved?
When datasetCacheSize > 0, appendStatisticsImpl commits and then does
cache.put(tableId, newDataset). Replacing an existing cache value does not
trigger the current close path, because the cache is wired with
evictionListener, while replacement is a removal event. The old Dataset can
remain unclosed, causing leaked native/file handles during repeated statistics
updates. Default config (datasetCacheSize=0) avoids this path, but cached
deployments are affected.
### How should we improve?
Ensure replaced cache entries are closed on any removal, not just eviction.
Update cache wiring to close Dataset via a removal callback that runs for
replacement/removal/eviction events. Keep invalidateAll() on shutdown so cached
datasets are closed at storage close.
Here's a unit test to help:
```
@Test
public void testDatasetCacheReplacementClosesOldDataset() throws Exception
{
String location =
Files.createTempDirectory("lance_stats_cache_replace").toString();
Map<String, String> properties = Maps.newHashMap();
properties.put("location", location);
properties.put("datasetCacheSize", "10");
LancePartitionStatisticStorage storage = new
LancePartitionStatisticStorage(properties);
try {
Cache<Long, Dataset> cache = storage.getDatasetCache();
Dataset oldDataset = mock(Dataset.class);
Dataset newDataset = mock(Dataset.class);
cache.put(1L, oldDataset);
cache.put(1L, newDataset);
verify(oldDataset, times(1)).close();
} finally {
FileUtils.deleteDirectory(new File(location));
storage.close();
}
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]