swaminathanmanish commented on code in PR #15142:
URL: https://github.com/apache/pinot/pull/15142#discussion_r1973913895


##########
pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/retention/RetentionManager.java:
##########
@@ -136,11 +164,86 @@ private void manageRetentionForOfflineTable(String 
offlineTableName, RetentionSt
     }
   }
 
+  /**
+   * Identifies segments in deepstore that are ready for deletion based on the 
retention strategy.
+   *
+   * This method finds segments that are beyond the retention period and are 
ready to be purged.
+   * It only considers segments that do not have entries in ZooKeeper metadata.
+   * The lastModified time of the file in deepstore is used to determine 
whether the segment
+   * should be retained or purged.
+   *
+   * @param tableNameWithType   Name of the offline table
+   * @param retentionStrategy  Strategy to determine if a segment should be 
purged
+   * @param segmentsToExclude  List of segment names that should be excluded 
from deletion
+   * @return List of segment names that should be deleted from deepstore
+   * @throws IOException If there's an error accessing the filesystem
+   */
+  private List<String> getSegmentsToDeleteFromDeepstore(String 
tableNameWithType, RetentionStrategy retentionStrategy,
+      List<String> segmentsToExclude)
+      throws IOException {
+
+    List<String> segmentsToDelete = new ArrayList<>();
+    String rawTableName = 
TableNameBuilder.extractRawTableName(tableNameWithType);
+    URI tableDataUri = 
URIUtils.getUri(_pinotHelixResourceManager.getDataDir(), rawTableName);
+    PinotFS pinotFS = PinotFSFactory.create(tableDataUri.getScheme());
+
+    List<FileMetadata> deepstoreFiles = 
pinotFS.listFilesWithMetadata(tableDataUri, false);
+
+    for (FileMetadata fileMetadata : deepstoreFiles) {
+      if (fileMetadata.isDirectory()) {
+        continue;
+      }
+
+      String segmentName = extractSegmentName(fileMetadata.getFilePath());
+      if (Strings.isEmpty(segmentName) || 
segmentsToExclude.contains(segmentName)) {
+        continue;
+      }
+
+      // determine whether the segment should be perged or not based on the 
last modified time of the file
+      long lastModifiedTime = fileMetadata.getLastModifiedTime();
+
+      if (retentionStrategy.isPurgeable(segmentName, tableNameWithType, 
lastModifiedTime)) {
+        segmentsToDelete.add(segmentName);
+      }
+    }
+
+    return segmentsToDelete;
+  }
+
+  @Nullable
+  private String extractSegmentName(@Nullable String filePath) {
+    if (Strings.isEmpty(filePath)) {
+      return null;
+    }
+    String segmentName = filePath.substring(filePath.lastIndexOf("/") + 1);
+    if (segmentName.endsWith(TarCompressionUtils.TAR_GZ_FILE_EXTENSION)) {
+      segmentName = segmentName.substring(0, segmentName.length() - 
TarCompressionUtils.TAR_GZ_FILE_EXTENSION.length());
+    }
+    return segmentName;
+  }
+
   private void manageRetentionForRealtimeTable(String realtimeTableName, 
RetentionStrategy retentionStrategy) {
+    List<SegmentZKMetadata> segmentZKMetadataList = 
_pinotHelixResourceManager.getSegmentsZKMetadata(realtimeTableName);

Review Comment:
   Can we use partitionSet from ideal state to get segment names instead? We 
really need only segment names and not Zk metadata. 
   
   ```
   /**
      * Returns the segments for the given table from the ideal state.
      *
      * @param tableNameWithType Table name with type suffix
      * @param shouldExcludeReplacedSegments whether to return the list of 
segments that doesn't contain replaced segments.
      * @param startTimestamp  start timestamp in milliseconds (inclusive)
      * @param endTimestamp  end timestamp in milliseconds (exclusive)
      * @param excludeOverlapping  whether to exclude the segments overlapping 
with the timestamps
      * @return List of segment names
      */
     public List<String> getSegmentsFor(String tableNameWithType, boolean 
shouldExcludeReplacedSegments,
         long startTimestamp, long endTimestamp, boolean excludeOverlapping)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to