stym06 opened a new issue, #8460:
URL: https://github.com/apache/pinot/issues/8460

   Hey guys,
   I've been trying to ingest data stored on S3 in ORC format using the Pinot 
ingestor with the below command:
   `./pinot-admin.sh LaunchDataIngestionJob -jobSpecFile 
batch-job-standalone-spec.yaml`
   
   ### Ingestion job spec
   ```
   executionFrameworkSpec:
     name: 'standalone'
     segmentGenerationJobRunnerClassName: 
'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner'
     segmentTarPushJobRunnerClassName: 
'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner'
     segmentUriPushJobRunnerClassName: 
'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner'
     segmentMetadataPushJobRunnerClassName: 
'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentMetadataPushJobRunner'
   jobType: SegmentCreationAndMetadataPush
   inputDirURI: 's3://test-bucket/dev/pinot-input-new/'
   outputDirURI: 's3://test-bucket/dev/pinot/axon_entity.db/segments-v2'
   overwriteOutput: true
   pinotFSSpecs:
     - scheme: s3
       className: org.apache.pinot.plugin.filesystem.S3PinotFS
       configs:
         region: ap-southeast-1
   recordReaderSpec:
     dataFormat: 'orc'
     className: 'org.apache.pinot.plugin.inputformat.orc.ORCRecordReader'
   tableSpec:
     tableName: 'user_base_fact'
     schemaURI: 'http://localhost:9000/tables/user_base_fact/schema'
     tableConfigURI: 'http://localhost:9000/tables/user_base_fact'
   pinotClusterSpecs:
     - controllerURI: 'http://localhost:9000'
   pushJobSpec:
     pushParallelism: 2
     pushAttempts: 2
     pushRetryIntervalMillis: 1000
   ```
   
   The job is able to complete but leads to all null values in the Pinot table:
   <img width="1335" alt="Screenshot 2022-04-04 at 3 13 38 PM" 
src="https://user-images.githubusercontent.com/20970728/161518329-3fa4f1c0-cced-4294-bbcd-0ff2382a3a3a.png";>
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to