mcvsubbu commented on a change in pull request #5764:
URL: https://github.com/apache/incubator-pinot/pull/5764#discussion_r462422577



##########
File path: 
pinot-server/src/main/java/org/apache/pinot/server/starter/helix/SegmentFetcherAndLoader.java
##########
@@ -184,39 +186,48 @@ private boolean isNewSegmentMetadata(String 
tableNameWithType, OfflineSegmentZKM
 
   private String downloadSegmentToLocal(String uri, PinotCrypter crypter, 
String tableName, String segmentName)
       throws Exception {
-    File tempDir = new File(new 
File(_instanceDataManager.getSegmentFileDirectory(), tableName),
-        "tmp-" + segmentName + "-" + UUID.randomUUID());
-    FileUtils.forceMkdir(tempDir);
-    File tempDownloadFile = new File(tempDir, segmentName + ENCODED_SUFFIX);
-    File tempTarFile = new File(tempDir, segmentName + TAR_GZ_SUFFIX);
-    File tempSegmentDir = new File(tempDir, segmentName);
-    try {
-      SegmentFetcherFactory.fetchSegmentToLocal(uri, tempDownloadFile);
-      if (crypter != null) {
-        crypter.decrypt(tempDownloadFile, tempTarFile);
-      } else {
-        tempTarFile = tempDownloadFile;
-      }
-
-      LOGGER
-          .info("Downloaded tarred segment: {} for table: {} from: {} to: {}, 
file length: {}", segmentName, tableName,
-              uri, tempTarFile, tempTarFile.length());
+    // Even if the tar file has been downloaded successfully, the file itself 
could be corrupted during the transmission.

Review comment:
       Yes, that works. And if there are cases where the segment downloaded by 
server is truncated, we need to fix that instead of adding retries. Obvious 
things to check if it is something more than manual operations:
   *  Are we sending the reload message before moving the segment to its final 
location? It appears no from a cursory look.
   * Is there a possibility that multiple controllers are fielding the same 
segment ? (You checked the logs to see that this was not the case, but) We are 
not handling this case. The source file is likely to get corrupted since each 
controller will use `pinotFS.copyToDestination()` (not exact API) to set the 
segment in the destination. We should enhance PinotFS to support copying to a 
temp destination and then moving in place, or use two apis to move/copy the 
file. Note that this can happen even if the sender of the segment decides to 
give up on a (slow) connection and retry (without closing the old connection, 
or closing it too late), and get a different controller to upload the segment.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to