chrajeshbabu commented on code in PR #14474: URL: https://github.com/apache/pinot/pull/14474#discussion_r1851151881
########## pinot-plugins/pinot-file-system/pinot-s3/src/main/java/org/apache/pinot/plugin/filesystem/S3PinotFS.java: ########## @@ -501,12 +504,14 @@ public long length(URI fileUri) public String[] listFiles(URI fileUri, boolean recursive) throws IOException { ImmutableList.Builder<String> builder = ImmutableList.builder(); + String scheme = fileUri.getScheme(); + Preconditions.checkArgument(scheme.equals(S3_SCHEME) || scheme.equals(S3A_SCHEME)); visitFiles(fileUri, recursive, s3Object -> { if (!s3Object.key().equals(fileUri.getPath()) && !s3Object.key().endsWith(DELIMITER)) { - builder.add(S3_SCHEME + fileUri.getHost() + DELIMITER + getNormalizedFileKey(s3Object)); + builder.add(scheme + SCHEME_SEPARATOR + fileUri.getHost() + DELIMITER + getNormalizedFileKey(s3Object)); Review Comment: Yes @mayankshriv. During the server/controller initialisation what ever the scheme configured to factory class is getting mapped to file system. This allowing to configure s3a scheme. ``` public static void init(PinotConfiguration fsFactoryConfig) { // Get schemes and their respective classes PinotConfiguration schemesConfiguration = fsFactoryConfig.subset(CLASS); List<String> schemes = schemesConfiguration.getKeys(); if (!schemes.isEmpty()) { LOGGER.info("Did not find any fs classes in the configuration"); } for (String scheme : schemes) { String fsClassName = schemesConfiguration.getProperty(scheme); PinotConfiguration fsConfiguration = fsFactoryConfig.subset(scheme); LOGGER.info("Got scheme {}, initializing class {}", scheme, fsClassName); register(scheme, fsClassName, fsConfiguration); } } ``` Since the scheme is hard coded to s3 in S3PinotFS it's not allowing the list the objects in the s3a bucket. ``` if (!s3Object.key().equals(fileUri.getPath()) && !s3Object.key().endsWith(DELIMITER)) { builder.add(S3_SCHEME + fileUri.getHost() + DELIMITER + getNormalizedFileKey(s3Object)); } ``` With the patch able to create/list the objects in the s3a bucket. Tested the patch in the real cluster as well just by configuring the protocols supported as s3a where ever s3 needs to be configured. Here is sample configurations done at controller, server and ingestion job level to make it work, ``` controller.data.dir=s3a:<buckeneame>/data/pinot/controller/segment controller.local.temp.dir=<tmp-path>/pinot/ controller.enable.split.commit=true controller.helix.cluster.name=pinot-s3-quickstart pinot.controller.storage.factory.class.s3a=org.apache.pinot.plugin.filesystem.S3PinotFS pinot.controller.storage.factory.s3a.accessKey=<accesskey> pinot.controller.storage.factory.s3a.secretKey=<secret-key> pinot.controller.segment.fetcher.protocols=file,http,s3a pinot.controller.segment.fetcher.s3a.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher pinot.controller.storage.factory.s3a.disableAcl=false pinot.controller.storage.factory.s3a.endpoint=<endpoint> pinot.controller.storage.factory.s3a.region=<region> ``` ``` pinot.server.storage.factory.class.s3a=org.apache.pinot.plugin.filesystem.S3PinotFS pinot.server.storage.factory.s3a.accessKey=<access-key> pinot.server.storage.factory.s3a.secretKey=<secret-key> pinot.server.segment.fetcher.protocols=file,http,s3a pinot.server.segment.fetcher.s3a.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher pinot.server.storage.factory.s3a.disableAcl=false pinot.server.storage.factory.s3a.endpoint=<endpoint> pinot.server.storage.factory.s3a.region=<region> ``` ``` - scheme: s3a className: org.apache.pinot.plugin.filesystem.S3PinotFS configs: region: '<region>' accessKey: <accesskey> secretKey: <secretkey> disableAcl: false endpoint: <endpoint> ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org