Copilot commented on code in PR #16848:
URL: https://github.com/apache/pinot/pull/16848#discussion_r2365751458
##########
pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/SegmentDeletionManager.java:
##########
@@ -425,37 +362,71 @@ public void
removeAgedDeletedSegments(LeadControllerManager leadControllerManage
if (leadControllerManager.isLeaderForTable(tableName)) {
URI tableNameURI = URIUtils.getUri(deletedDirURI.toString(),
URIUtils.encode(tableName));
// Get files that are aged
- final String[] targetFiles = pinotFS.listFiles(tableNameURI,
false);
- int numFilesDeleted = 0;
- URI targetURI = null;
- for (String targetFile : targetFiles) {
+ final List<FileMetadata> targetFiles =
pinotFS.listFilesWithMetadata(tableNameURI, false);
+
+ if (targetFiles.isEmpty()) {
+ LOGGER.info("Deleting empty deleted segments directory: {} for
table: {}", tableNameURI, tableName);
try {
- targetURI =
- URIUtils.getUri(tableNameURI.toString(),
URIUtils.encode(URIUtils.getLastPart(targetFile)));
- long deletionTimeMs =
getDeletionTimeMsFromFile(targetURI.toString(),
pinotFS.lastModified(targetURI));
- if (System.currentTimeMillis() >= deletionTimeMs) {
- if (!deleteWithTimeout(pinotFS, targetURI, true,
OBJECT_DELETION_TIMEOUT, TimeUnit.SECONDS)) {
- LOGGER.warn("Failed to remove resource: {}", targetURI);
- } else {
- numFilesDeleted++;
- if (numFilesDeleted >=
NUM_AGED_SEGMENTS_TO_DELETE_PER_ATTEMPT) {
- LOGGER.info("Reached threshold of max aged segments to
delete per attempt. Deleted: {} files "
- + "from directory: {}", numFilesDeleted,
tableNameDir);
- break;
- }
- }
+ if (!pinotFS.delete(tableNameURI, false)) {
+ LOGGER.info("Could not delete deleted segments directory: {}
for table: {}", tableNameURI, tableName);
+ } else {
+ LOGGER.info("Successfully deleted deleted segments
directory: {} for table: {}", tableNameURI,
+ tableName);
}
} catch (Exception e) {
- LOGGER.warn("Failed to delete uri: {} for table: {}",
targetURI, tableName, e);
+ LOGGER.error("Exception occurred while deleting deleted
segments directory: {} for table: {}",
+ tableNameURI, tableName, e);
}
+ continue;
}
+ int numFilesDeleted = 0;
+ URI targetURI = null;
+ List<URI> targetURIs = new ArrayList<>();
+ for (FileMetadata targetFile : targetFiles) {
+ // Some file system implementations also return the current
table directory
+ // we do not want to delete the table directory
+ if (targetFile.isDirectory()) {
+ continue;
+ }
- if (numFilesDeleted == targetFiles.length) {
- // Delete directory if it's empty
- if (!deleteWithTimeout(pinotFS, tableNameURI, false,
OBJECT_DELETION_TIMEOUT, TimeUnit.SECONDS)) {
- LOGGER.warn("Failed to remove the directory: {}",
tableNameDir);
+ targetURI =
+ URIUtils.getUri(tableNameURI.toString(),
+
URIUtils.encode(URIUtils.getLastPart(targetFile.getFilePath())));
+ long deletionTimeMs =
getDeletionTimeMsFromFile(targetURI.toString(),
targetFile.getLastModifiedTime());
+ if (System.currentTimeMillis() >= deletionTimeMs) {
+ numFilesDeleted++;
+ targetURIs.add(targetURI);
+ if (numFilesDeleted ==
NUM_AGED_SEGMENTS_TO_DELETE_PER_ATTEMPT) {
+ LOGGER.info(
+ "Reached threshold of max aged segments to delete per
attempt. Deleting: {} segment files "
+ + "from directory: {}", numFilesDeleted,
tableNameDir);
Review Comment:
numFilesDeleted now counts selections, not confirmed deletions (previous
code incremented only on successful delete), so failures in deleteBatch (or
partial deletions) reduce actual progress and may repeatedly skip remaining
candidates. Either: (1) rename variable to reflect selection intent and adjust
log message, or (2) after deleteBatch completion verify success and, if failed,
re-queue or retry remaining segments.
##########
pinot-controller/src/test/java/org/apache/pinot/controller/helix/core/util/SegmentDeletionManagerTest.java:
##########
@@ -619,14 +643,32 @@ public boolean delete(URI uri, boolean forceDelete)
_tableDirs.remove(uri.getPath() + "/");
return true;
}
- // remote the segment
+ // remove the segment
String tableName = uri.getPath().substring(0,
uri.getPath().lastIndexOf("/") + 1);
return _tableDirs.get(tableName).remove(uri.getPath());
}
+ @Override
+ public boolean deleteBatch(List<URI> segmentUris, boolean forceDelete)
+ throws IOException {
+ // the expectation here is that the batch delete call is only limited to
segments.
+ URI segmentURI = segmentUris.get(0);
+ String tableName = segmentURI.getPath().substring(0,
segmentURI.getPath().lastIndexOf("/") + 1);
+ if (_tableDirs.containsKey(tableName)) {
+ // remove all the segments from the table directory
+ segmentUris.forEach(segmentUri ->
_tableDirs.get(tableName).remove(segmentUri.getPath()));
+ }
+ // the table does not exist and we return a true;
+ return true;
+ }
Review Comment:
This mock always returns true (even when the table directory does not exist
or no segments were removed), which can hide error handling paths and prevent
coverage of failure scenarios in the async deletion flow. Consider returning
false when no segments are actually removed or when the table directory is
missing to better exercise failure handling logic.
##########
pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/SegmentDeletionManager.java:
##########
@@ -425,37 +362,71 @@ public void
removeAgedDeletedSegments(LeadControllerManager leadControllerManage
if (leadControllerManager.isLeaderForTable(tableName)) {
URI tableNameURI = URIUtils.getUri(deletedDirURI.toString(),
URIUtils.encode(tableName));
// Get files that are aged
- final String[] targetFiles = pinotFS.listFiles(tableNameURI,
false);
- int numFilesDeleted = 0;
- URI targetURI = null;
- for (String targetFile : targetFiles) {
+ final List<FileMetadata> targetFiles =
pinotFS.listFilesWithMetadata(tableNameURI, false);
+
+ if (targetFiles.isEmpty()) {
+ LOGGER.info("Deleting empty deleted segments directory: {} for
table: {}", tableNameURI, tableName);
try {
- targetURI =
- URIUtils.getUri(tableNameURI.toString(),
URIUtils.encode(URIUtils.getLastPart(targetFile)));
- long deletionTimeMs =
getDeletionTimeMsFromFile(targetURI.toString(),
pinotFS.lastModified(targetURI));
- if (System.currentTimeMillis() >= deletionTimeMs) {
- if (!deleteWithTimeout(pinotFS, targetURI, true,
OBJECT_DELETION_TIMEOUT, TimeUnit.SECONDS)) {
- LOGGER.warn("Failed to remove resource: {}", targetURI);
- } else {
- numFilesDeleted++;
- if (numFilesDeleted >=
NUM_AGED_SEGMENTS_TO_DELETE_PER_ATTEMPT) {
- LOGGER.info("Reached threshold of max aged segments to
delete per attempt. Deleted: {} files "
- + "from directory: {}", numFilesDeleted,
tableNameDir);
- break;
- }
- }
+ if (!pinotFS.delete(tableNameURI, false)) {
+ LOGGER.info("Could not delete deleted segments directory: {}
for table: {}", tableNameURI, tableName);
+ } else {
+ LOGGER.info("Successfully deleted deleted segments
directory: {} for table: {}", tableNameURI,
+ tableName);
}
} catch (Exception e) {
- LOGGER.warn("Failed to delete uri: {} for table: {}",
targetURI, tableName, e);
+ LOGGER.error("Exception occurred while deleting deleted
segments directory: {} for table: {}",
+ tableNameURI, tableName, e);
}
+ continue;
}
+ int numFilesDeleted = 0;
+ URI targetURI = null;
+ List<URI> targetURIs = new ArrayList<>();
+ for (FileMetadata targetFile : targetFiles) {
+ // Some file system implementations also return the current
table directory
+ // we do not want to delete the table directory
+ if (targetFile.isDirectory()) {
+ continue;
+ }
- if (numFilesDeleted == targetFiles.length) {
- // Delete directory if it's empty
- if (!deleteWithTimeout(pinotFS, tableNameURI, false,
OBJECT_DELETION_TIMEOUT, TimeUnit.SECONDS)) {
- LOGGER.warn("Failed to remove the directory: {}",
tableNameDir);
+ targetURI =
+ URIUtils.getUri(tableNameURI.toString(),
+
URIUtils.encode(URIUtils.getLastPart(targetFile.getFilePath())));
+ long deletionTimeMs =
getDeletionTimeMsFromFile(targetURI.toString(),
targetFile.getLastModifiedTime());
Review Comment:
The logic now fully depends on targetFile.getLastModifiedTime(), but the
FileMetadata provided by some PinotFS implementations (and the test stub) may
not populate this field (defaulting to 0) leading to premature deletions
(treating segments as immediately aged). Add a fallback to
pinotFS.lastModified(targetURI) when getLastModifiedTime() <= 0 to preserve
previous behavior for filesystems that do not supply metadata timestamps.
```suggestion
long lastModifiedTime = targetFile.getLastModifiedTime();
if (lastModifiedTime <= 0) {
try {
lastModifiedTime = pinotFS.lastModified(targetURI);
} catch (Exception e) {
LOGGER.warn("Could not get last modified time for file:
{}. Skipping deletion.", targetURI, e);
continue;
}
}
long deletionTimeMs =
getDeletionTimeMsFromFile(targetURI.toString(), lastModifiedTime);
```
##########
pinot-controller/src/test/java/org/apache/pinot/controller/helix/core/util/SegmentDeletionManagerTest.java:
##########
@@ -295,18 +297,26 @@ public void testRemoveDeletedSegments()
deletionManager.removeAgedDeletedSegments(leadControllerManager);
// Check that only 1 day retention file is remaining
- Assert.assertEquals(dummyDir1.list().length, 1);
+ TestUtils.waitForCondition((aVoid) -> dummyDir1.list().length == 1, 1000,
100000,
+ "Unable to delete desired segments from dummyDir1");
+
+ // Check that empty directory has not been removed in the first run
+ TestUtils.waitForCondition((aVoid) -> dummyDir2.exists(), 1000, 100000,
+ "dummyDir2 does not exist");
Review Comment:
[nitpick] Waiting for a condition that is already true (existence) does not
guarantee the directory was preserved for the intended duration and can mask a
race where it is removed shortly after; a direct assertTrue(dummyDir2.exists())
after prior waits would be clearer, or assert it remains until the next manager
invocation.
```suggestion
Assert.assertTrue(dummyDir2.exists(), "dummyDir2 does not exist");
```
##########
pinot-controller/src/test/java/org/apache/pinot/controller/helix/core/util/SegmentDeletionManagerTest.java:
##########
@@ -640,9 +682,23 @@ public String[] listFiles(URI fileUri, boolean recursive)
return _tableDirs.get(tableName).toArray(new String[0]);
}
+ @Override
+ public List<FileMetadata> listFilesWithMetadata(URI fileUri, boolean
recursive) {
+ if (_tableDirs.containsKey(fileUri.getPath() + "/")) {
+ return _tableDirs.get(fileUri.getPath() + "/")
+ .stream()
+ .map(segmentFilePath -> new
FileMetadata.Builder().setFilePath(segmentFilePath).build())
+ .collect(Collectors.toList());
Review Comment:
Test stub builds FileMetadata without lastModifiedTime (and directory flag),
causing all returned entries to appear with an unset timestamp (likely 0) which
can incorrectly mark them as immediately deletable; this weakens coverage of
retention logic. Populate lastModifiedTime (and isDirectory where applicable)
to realistically exercise age-based deletion decisions.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]