xiangfu0 commented on a change in pull request #7193:
URL: https://github.com/apache/pinot/pull/7193#discussion_r675884881



##########
File path: 
pinot-tools/src/main/java/org/apache/pinot/tools/admin/command/SegmentProcessorFrameworkCommand.java
##########
@@ -72,25 +80,42 @@ public String description() {
   @Override
   public boolean execute()
       throws Exception {
+    PluginManager.get().init();
 
     SegmentProcessorFrameworkSpec segmentProcessorFrameworkSpec =
         JsonUtils.fileToObject(new File(_segmentProcessorFrameworkSpec), 
SegmentProcessorFrameworkSpec.class);
 
     File inputSegmentsDir = new 
File(segmentProcessorFrameworkSpec.getInputSegmentsDir());
     File outputSegmentsDir = new 
File(segmentProcessorFrameworkSpec.getOutputSegmentsDir());
-    if (!outputSegmentsDir.exists()) {
-      if (!outputSegmentsDir.mkdirs()) {
-        throw new RuntimeException(
-            "Did not find output directory, and could not create it either: " 
+ segmentProcessorFrameworkSpec
-                .getOutputSegmentsDir());
+    File workingDir = new File(outputSegmentsDir, "tmp-" + UUID.randomUUID());
+    File untarredSegmentsDir = new File(workingDir, "untarred_segments");
+    FileUtils.forceMkdir(untarredSegmentsDir);
+    File[] segmentDirs = inputSegmentsDir.listFiles();
+    Preconditions
+        .checkState(segmentDirs != null && segmentDirs.length > 0, "Failed to 
find files under input segments dir: %s",
+            inputSegmentsDir.getAbsolutePath());
+    List<RecordReader> recordReaders = new ArrayList<>(segmentDirs.length);
+    for (File segmentDir : segmentDirs) {
+      String fileName = segmentDir.getName();
+
+      // Untar the segments if needed
+      if (!segmentDir.isDirectory()) {
+        if (fileName.endsWith(".tar.gz") || fileName.endsWith(".tgz")) {
+          segmentDir = TarGzCompressionUtils.untar(segmentDir, 
untarredSegmentsDir).get(0);
+        } else {
+          throw new IllegalStateException("Unsupported segment format: " + 
segmentDir.getAbsolutePath());

Review comment:
       Not relevant to this PR. 
   I somehow feel we may want to have a util function to check if a file is in 
tar gz format.
   E.g. controller directory stores segment tar gz files without extension.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to