shwin commented on code in PR #10034:
URL: https://github.com/apache/pinot/pull/10034#discussion_r1080811688


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/utils/SegmentPushUtils.java:
##########
@@ -284,7 +284,22 @@ public static void 
sendSegmentUriAndMetadata(SegmentGenerationJobSpec spec, Pino
       String segmentName = fileName.endsWith(Constants.TAR_GZ_FILE_EXT)
           ? fileName.substring(0, fileName.length() - 
Constants.TAR_GZ_FILE_EXT.length()) : fileName;
       SegmentNameUtils.validatePartialOrFullSegmentName(segmentName);
-      File segmentMetadataFile = generateSegmentMetadataFile(fileSystem, 
URI.create(tarFilePath));
+      File segmentMetadataFile;
+      // Check if there is a segment metadata tar gz file named 
`segmentName.metadata.tar.gz`, already in the remote
+      // directory. This is to avoid generating a new segment metadata tar gz 
file every time we push a segment,
+      // which requires downloading the entire segment tar gz file.
+      URI metadataTarGzFilePath = URI.create(
+          new File(tarFilePath).getParentFile() + File.separator + segmentName 
+ Constants.METADATA_TAR_GZ_FILE_EXT);
+      if (spec.getPushJobSpec().isPreferMetadataTarGz() && 
fileSystem.exists(metadataTarGzFilePath)) {

Review Comment:
   Mentioned this in slack 
(https://apache-pinot.slack.com/archives/C011C9JHN7R/p1674101587879929?thread_ts=1674101384.733239&cid=C011C9JHN7R)
 but I think this line doesn't work with S3 paths:
   
   `File(tarFilePath).getParentFile()` will end up returning `s3:/bucket/path` 
NOT `s3://bucket/path` which ends up messing the subsequent existence code.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to