swaminathanmanish commented on code in PR #15142: URL: https://github.com/apache/pinot/pull/15142#discussion_r1973944323
########## pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/retention/RetentionManager.java: ########## @@ -136,11 +164,86 @@ private void manageRetentionForOfflineTable(String offlineTableName, RetentionSt } } + /** + * Identifies segments in deepstore that are ready for deletion based on the retention strategy. + * + * This method finds segments that are beyond the retention period and are ready to be purged. + * It only considers segments that do not have entries in ZooKeeper metadata. + * The lastModified time of the file in deepstore is used to determine whether the segment + * should be retained or purged. + * + * @param tableNameWithType Name of the offline table + * @param retentionStrategy Strategy to determine if a segment should be purged + * @param segmentsToExclude List of segment names that should be excluded from deletion + * @return List of segment names that should be deleted from deepstore + * @throws IOException If there's an error accessing the filesystem + */ + private List<String> getSegmentsToDeleteFromDeepstore(String tableNameWithType, RetentionStrategy retentionStrategy, + List<String> segmentsToExclude) + throws IOException { + + List<String> segmentsToDelete = new ArrayList<>(); + String rawTableName = TableNameBuilder.extractRawTableName(tableNameWithType); + URI tableDataUri = URIUtils.getUri(_pinotHelixResourceManager.getDataDir(), rawTableName); + PinotFS pinotFS = PinotFSFactory.create(tableDataUri.getScheme()); + + List<FileMetadata> deepstoreFiles = pinotFS.listFilesWithMetadata(tableDataUri, false); + + for (FileMetadata fileMetadata : deepstoreFiles) { + if (fileMetadata.isDirectory()) { + continue; + } + + String segmentName = extractSegmentName(fileMetadata.getFilePath()); + if (Strings.isEmpty(segmentName) || segmentsToExclude.contains(segmentName)) { + continue; + } + + // determine whether the segment should be perged or not based on the last modified time of the file + long lastModifiedTime = fileMetadata.getLastModifiedTime(); + + if (retentionStrategy.isPurgeable(segmentName, tableNameWithType, lastModifiedTime)) { Review Comment: Is there a default retention ? If its 0, we will end up removing the segment from Deep store, when minion might have just pushed it. ########## pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/retention/strategy/RetentionStrategy.java: ########## @@ -34,4 +34,14 @@ public interface RetentionStrategy { * @return Whether the segment should be purged */ boolean isPurgeable(String tableNameWithType, SegmentZKMetadata segmentZKMetadata); -} + + /** + * Returns whether the segment should be purged based on end time. + * + * @param segmentName The name of the segment to check + * @param tableNameWithType Table name with type + * @param endTimeMs The end time of the segment in milliseconds + * @return Whether the segment should be purged + */ + boolean isPurgeable(String segmentName, String tableNameWithType, long endTimeMs); Review Comment: Suggest having this method local to SegmentDeletionManager since endTime is supplied from outside. It'll be confusing as to what needs to be passed in (file modification time/segment time etc..). The caller can get the retention time and do its own checks ? ########## pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/retention/RetentionManager.java: ########## @@ -124,8 +138,19 @@ private void manageRetentionForTable(TableConfig tableConfig) { } private void manageRetentionForOfflineTable(String offlineTableName, RetentionStrategy retentionStrategy) { + List<SegmentZKMetadata> segmentZKMetadataList = _pinotHelixResourceManager.getSegmentsZKMetadata(offlineTableName); Review Comment: Can we not just rely on the ideal state as source of truth because that would avoid reading all of segment Zk ? We are going to check retention window as well prior to deletion (if the concern is deleting a new segment that just got added, which can happen even when the segment is not in Zk metadata but just in deep store). ########## pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/retention/RetentionManager.java: ########## @@ -124,8 +136,24 @@ private void manageRetentionForTable(TableConfig tableConfig) { } private void manageRetentionForOfflineTable(String offlineTableName, RetentionStrategy retentionStrategy) { Review Comment: We need this for manageRetentionForRealtimeTable as well right? There can be tasks that work on realtime tables as well. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org