justinmclean opened a new issue, #10598:
URL: https://github.com/apache/gravitino/issues/10598

   ### What would you like to be improved?
   
   LancePartitionStatisticStorage.close() closes the Arrow allocator before 
invalidating datasetCache in LancePartitionStatisticStorage.java (line 352). 
When dataset caching is enabled, cached Dataset instances remain open during 
normal operations and are expected to be released during cache invalidation on 
shutdown. Closing the allocator first can make cleanup happen in the wrong 
order, which may cause shutdown-time exceptions or leave cached dataset 
resources unclosed.
   
   ### How should we improve?
   
   Release cached datasets before closing the allocator. A simple fix would be 
to invalidate or explicitly close all cached Dataset instances first, then 
close the allocator, and finally shut down the scheduler.
   
   Here's a unit test to help:
   ```
   
     @Test
     public void testCloseReleasesCachedDatasetBeforeAllocator() throws 
Exception {
       String location = 
Files.createTempDirectory("lance_stats_close_cache").toString();
       Map<String, String> properties = Maps.newHashMap();
       properties.put("location", location);
       properties.put("datasetCacheSize", "10");
   
       LancePartitionStatisticStorage storage = new 
LancePartitionStatisticStorage(properties);
       try {
         BufferAllocator allocator = mock(BufferAllocator.class);
         FieldUtils.writeField(storage, "allocator", allocator, true);
   
         Cache<Long, Dataset> datasetCache = storage.getDatasetCache();
         Assertions.assertNotNull(datasetCache);
   
         Dataset dataset = mock(Dataset.class);
         datasetCache.put(1L, dataset);
   
         storage.close();
   
         InOrder inOrder = inOrder(dataset, allocator);
         inOrder.verify(dataset).close();
         inOrder.verify(allocator).close();
       } finally {
         FileUtils.deleteDirectory(new File(location));
       }
     }
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to