walterddr commented on code in PR #9245:
URL: https://github.com/apache/pinot/pull/9245#discussion_r949679994


##########
pinot-plugins/pinot-batch-ingestion/pinot-batch-ingestion-standalone/src/main/java/org/apache/pinot/plugin/ingestion/batch/standalone/SegmentGenerationJobRunner.java:
##########
@@ -172,7 +172,7 @@ public void init(SegmentGenerationJobSpec spec) {
   public void run()
       throws Exception {
     // Get list of files to process.
-    String[] files = _inputDirFS.listFiles(_inputDirURI, true);
+    String[] files = _inputDirFS.listFilesWithPattern(_inputDirURI, 
_spec.getIncludeFileNamePattern());

Review Comment:
   let's also refactor 
   
https://github.com/apache/pinot/pull/9245/files#diff-7bec8ff885254fee67d4ad50c4f06a7894976700df3fd2ef42af6532fedfea26L181-L203
   into this function. 
   
   the name `listFilesWithPattern` suggest it should already finish the glob 
searching



##########
pinot-spi/src/main/java/org/apache/pinot/spi/filesystem/PinotFSFactory.java:
##########
@@ -37,6 +37,7 @@ private PinotFSFactory() {
   }
 
   public static final String LOCAL_PINOT_FS_SCHEME = "file";
+  public static final String RECURSIVE_BLOB_PREFIX = "glob:**";

Review Comment:
   recursive hint from glob pattern should do recursive search whenever there's 
a `**` pattern included, b/c `**` can occur in the middle such as 
`subFolder/foo/**/bar`
   



##########
pinot-spi/src/main/java/org/apache/pinot/spi/filesystem/PinotFS.java:
##########
@@ -144,6 +143,22 @@ boolean exists(URI fileUri)
   long length(URI fileUri)
       throws IOException;
 
+  /**
+   * Lists all the files and directories at the location provided.
+   * Lists recursively if {@code inputFilePattern} starts with "glob:**".
+   * Throws IOException if this abstract pathname is not valid, or if an I/O 
error occurs.
+   * @param fileUri location of file
+   * @return an array of strings that contains file paths
+   * @throws IOException on IO failure. See specific implementation
+   */
+  default String[] listFilesWithPattern(URI fileUri, String inputFilePattern)

Review Comment:
   let's not add the default implementation in `PinotFS` since this is a very 
generic interface.
   
   I would suggest let's add this as a static method in 
`SegmentGenerationJobSpec` 
   ```
     public static String[] listMatchedFiles(PinotFS pinotFs, URI fileUri, 
String includePattern, String excludePattern) {
       // ...
     }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to