shwin commented on code in PR #10034: URL: https://github.com/apache/pinot/pull/10034#discussion_r1080811688
########## pinot-segment-local/src/main/java/org/apache/pinot/segment/local/utils/SegmentPushUtils.java: ########## @@ -284,7 +284,22 @@ public static void sendSegmentUriAndMetadata(SegmentGenerationJobSpec spec, Pino String segmentName = fileName.endsWith(Constants.TAR_GZ_FILE_EXT) ? fileName.substring(0, fileName.length() - Constants.TAR_GZ_FILE_EXT.length()) : fileName; SegmentNameUtils.validatePartialOrFullSegmentName(segmentName); - File segmentMetadataFile = generateSegmentMetadataFile(fileSystem, URI.create(tarFilePath)); + File segmentMetadataFile; + // Check if there is a segment metadata tar gz file named `segmentName.metadata.tar.gz`, already in the remote + // directory. This is to avoid generating a new segment metadata tar gz file every time we push a segment, + // which requires downloading the entire segment tar gz file. + URI metadataTarGzFilePath = URI.create( + new File(tarFilePath).getParentFile() + File.separator + segmentName + Constants.METADATA_TAR_GZ_FILE_EXT); + if (spec.getPushJobSpec().isPreferMetadataTarGz() && fileSystem.exists(metadataTarGzFilePath)) { Review Comment: Mentioned this in slack (https://apache-pinot.slack.com/archives/C011C9JHN7R/p1674101587879929?thread_ts=1674101384.733239&cid=C011C9JHN7R) but I think this line doesn't work with S3 paths: `File(tarFilePath).getParentFile()` will end up returning `s3:/bucket/path` NOT `s3://bucket/path` which ends up messing the subsequent existence code. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org