chrajeshbabu commented on code in PR #14474:
URL: https://github.com/apache/pinot/pull/14474#discussion_r1851151881


##########
pinot-plugins/pinot-file-system/pinot-s3/src/main/java/org/apache/pinot/plugin/filesystem/S3PinotFS.java:
##########
@@ -501,12 +504,14 @@ public long length(URI fileUri)
   public String[] listFiles(URI fileUri, boolean recursive)
       throws IOException {
     ImmutableList.Builder<String> builder = ImmutableList.builder();
+    String scheme = fileUri.getScheme();
+    Preconditions.checkArgument(scheme.equals(S3_SCHEME) || 
scheme.equals(S3A_SCHEME));
     visitFiles(fileUri, recursive, s3Object -> {
       if (!s3Object.key().equals(fileUri.getPath()) && 
!s3Object.key().endsWith(DELIMITER)) {
-        builder.add(S3_SCHEME + fileUri.getHost() + DELIMITER + 
getNormalizedFileKey(s3Object));
+        builder.add(scheme + SCHEME_SEPARATOR + fileUri.getHost() + DELIMITER 
+ getNormalizedFileKey(s3Object));

Review Comment:
   Yes @mayankshriv. 
   
   During the server/controller initialisation what ever the scheme configured 
to factory class is getting mapped to file system. This allowing to configure 
s3a scheme.
   
   ```
     public static void init(PinotConfiguration fsFactoryConfig) {
       // Get schemes and their respective classes
       PinotConfiguration schemesConfiguration = fsFactoryConfig.subset(CLASS);
       List<String> schemes = schemesConfiguration.getKeys();
       if (!schemes.isEmpty()) {
         LOGGER.info("Did not find any fs classes in the configuration");
       }
   
       for (String scheme : schemes) {
         String fsClassName = schemesConfiguration.getProperty(scheme);
         PinotConfiguration fsConfiguration = fsFactoryConfig.subset(scheme);
         LOGGER.info("Got scheme {}, initializing class {}", scheme, 
fsClassName);
         register(scheme, fsClassName, fsConfiguration);
       }
     } 
   ```
   
   Since the scheme is hard coded to s3 in S3PinotFS it's not allowing the list 
the objects in the s3a bucket. 
   ```
   
         if (!s3Object.key().equals(fileUri.getPath()) && 
!s3Object.key().endsWith(DELIMITER)) {
           builder.add(S3_SCHEME + fileUri.getHost() + DELIMITER + 
getNormalizedFileKey(s3Object));
         }
   
   
   ```
   With the patch able to create/list the objects in the s3a bucket. Tested the 
patch in the real cluster as well just by configuring the protocols supported 
as s3a where ever s3 needs to be configured. Here is sample configurations done 
at controller, server and ingestion job level to make it work,
   
   ```
   controller.data.dir=s3a:<buckeneame>/data/pinot/controller/segment
   controller.local.temp.dir=<tmp-path>/pinot/
   controller.enable.split.commit=true
   controller.helix.cluster.name=pinot-s3-quickstart
   
pinot.controller.storage.factory.class.s3a=org.apache.pinot.plugin.filesystem.S3PinotFS
   pinot.controller.storage.factory.s3a.accessKey=<accesskey>
   pinot.controller.storage.factory.s3a.secretKey=<secret-key>
   pinot.controller.segment.fetcher.protocols=file,http,s3a
   
pinot.controller.segment.fetcher.s3a.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher
   pinot.controller.storage.factory.s3a.disableAcl=false
   pinot.controller.storage.factory.s3a.endpoint=<endpoint>
   pinot.controller.storage.factory.s3a.region=<region>
   
   ```
   
   ```
   
pinot.server.storage.factory.class.s3a=org.apache.pinot.plugin.filesystem.S3PinotFS
   pinot.server.storage.factory.s3a.accessKey=<access-key>
   pinot.server.storage.factory.s3a.secretKey=<secret-key>
   pinot.server.segment.fetcher.protocols=file,http,s3a
   
pinot.server.segment.fetcher.s3a.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher
   pinot.server.storage.factory.s3a.disableAcl=false
   pinot.server.storage.factory.s3a.endpoint=<endpoint>
   pinot.server.storage.factory.s3a.region=<region>
   
   ```
   
   ```
     - scheme: s3a
       className: org.apache.pinot.plugin.filesystem.S3PinotFS
       configs:
         region: '<region>'
         accessKey: <accesskey>
         secretKey: <secretkey>
         disableAcl: false
         endpoint: <endpoint>
   
   
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to