lgo opened a new issue #5877:
URL: https://github.com/apache/incubator-pinot/issues/5877


   Here's some of the setup:
   ```
   # pinot controller properties.
   
   # Requires `-Dplugins.dir=/opt/pinot/plugins -Dplugins.include=pinot-s3`
   #
   
pinot.controller.storage.factory.class.s3=org.apache.pinot.plugin.filesystem.S3PinotFS
   # Any S3 region
   pinot.controller.storage.factory.s3.region=us-west-1
   # Data directory for Pinot.
   controller.data.dir=s3://mybucket/myfolder/pinot
   ```
   
   When using an ingestion spec like the following
   ```yaml
   executionFrameworkSpec:
       name: 'standalone'
       segmentGenerationJobRunnerClassName: 
'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner'
       segmentTarPushJobRunnerClassName: 
'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner'
       segmentUriPushJobRunnerClassName: 
'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner'
   jobType: SegmentCreationAndUriPush
   inputDirURI: ...
   outputDirURI: 's3://mybucket/myfolder/pinot'
   overwriteOutput: true
   pinotFSSpecs:
       - scheme: s3
         className: org.apache.pinot.plugin.filesystem.S3PinotFS
         configs:
           region: 'us-west-2'
   pushJobSpec:
       # NB: This is particularly weird. Specifically, this seems
       # to be the "adjusted path" that is provided to the controller. I assume
       # that is because the ingestion job URI may not be the same for a
       # Controller?
       segmentUriPrefix: 's3://'
       segmentUriSuffix: ''
   recordReaderSpec:
       # Dataset specific config.
   tableSpec:
       # Table specific config.
   pinotClusterSpecs:
       # Cluster specific config.
   ```
   
   When using the standalone ingestion job via `bin/pinot-ingestion-job.sh`
   * Segment generation is fine.
   * Data shows up on S3 as expected and the logline in `S3PinotFS` for `Copy` 
has the correct path, but the `SegmentPushUtils` does not, and the the 
`SegmentUriPushJobRunner` fails will get a 500 from the controller due to the 
path not being found.
   
   ```
   2020/08/17 14:45:38.719 INFO [S3PinotFS] [main] Copy 
/tmp/pinot-a4064eea-301d-4f24-8861-0575a73e6a0b/output/mytable_OFFLINE_1569293930_1569293987_0.tar.gz
 from local to 
s3://mybucket/myfolder/pinot/mytable_OFFLINE_1569293930_1569293987_0.tar.gz
   2020/08/17 14:45:38.794 INFO [IngestionJobLauncher] [main] Trying to create 
instance for class 
org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner
   2020/08/17 14:45:38.795 INFO [PinotFSFactory] [main] Initializing PinotFS 
for scheme s3, classname org.apache.pinot.plugin.filesystem.S3PinotFS
   2020/08/17 14:45:38.920 INFO [SegmentPushUtils] [main] Start sending table 
mytable segment URIs: 
[s3:///myfolder/pinot/mytable_OFFLINE_1569293930_1569293987_0.tar.gz] to 
locations: [org.apache.pinot.spi.ingestion.batch.spec.PinotClusterSpec@4e07b95f]
   2020/08/17 14:45:38.920 INFO [SegmentPushUtils] [main] Sending table mytable 
segment URI: 
s3:///myfolder/pinot/mytable_OFFLINE_1569293930_1569293987_0.tar.gz to location
   ```
   
   I suspect it's related to how the output path is constructor before 
`SegmentPushUtils.sendSegmentUris`, but have not confirmed it.
   
   
https://github.com/apache/incubator-pinot/blob/2b58bfb520df074f691277f2ae5b01ecb5c686c2/pinot-plugins/pinot-batch-ingestion/pinot-batch-ingestion-standalone/src/main/java/org/apache/pinot/plugin/ingestion/batch/standalone/SegmentUriPushJobRunner.java#L90-L91
   
   It also was not clear that the same issue would happen with the Hadoop/Spark 
SegmentUri push jobs.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to