KKcorps opened a new pull request, #9288:
URL: https://github.com/apache/pinot/pull/9288

   The users currently need to create the whole spark-submit command to run a 
spark job for batch ingestion. With so many plugins available inside pinot 
leads a lot of classpath errors and you also need to take care of various 
arguments based on the environment in which you are running. This new command 
in `pinot-admin` aims to simply this for the users.
   
   e.g.
   Previously if you had to run
   ```
   export PINOT_VERSION=0.11.0-SNAPSHOT
   export 
PINOT_DISTRIBUTION_DIR=/Users/kharekartik/Documents/Developer/pinot/build/
   spark-submit --class 
org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand --master 
yarn --deploy-mode client --jars 
${PINOT_DISTRIBUTION_DIR}/lib/pinot-all-0.11.0-SNAPSHOT-jar-with-dependencies.jar,${PINOT_DISTRIBUTION_DIR}/plugins/pinot-input-format/pinot-parquet/pinot-parquet-0.11.0-SNAPSHOT-shaded.jar,${PINOT_DISTRIBUTION_DIR}/plugins/pinot-file-system/pinot-s3/pinot-s3-0.11.0-SNAPSHOT-shaded.jar,${PINOT_DISTRIBUTION_DIR}/plugins-external/pinot-batch-ingestion/pinot-batch-ingestion-spark-3.2/pinot-batch-ingestion-spark-3.2-0.11.0-SNAPSHOT-shaded.jar
 
local://${PINOT_DISTRIBUTION_DIR}/lib/pinot-all-0.11.0-SNAPSHOT-jar-with-dependencies.jar
 -jobSpecFile parquet_ingestion_spec_spark3_students.yml
   ```
   
   You can now use
   
   ```
   export SPARK_HOME=/usr/lib/spark/
   bin/pinot-admin.sh LaunchSparkDataIngestionJob  -jobSpecFile 
parquet_ingestion_spec_spark3_students.yml  -pluginsToLoad 
pinot-parquet:pinot-s3 -master yarn
   ```
   
   
   You can also mention any additional spark configurations using the 
`-sparkConf` option
   `-sparkConf spark.executor.cores=3:num-executors=4`
   
   Users can also specify jars directly from S3/GCS instead of local disk for 
environments like EMR
   `-pinotBaseDir s3://your-bucket/apache-pinot-0.11.0-SNAPSHOT`
   
   You can choose whether to run spark 2.x or 3.x with the following option 
(default is SPARK_3)
   `-sparkVersion SPARK_2`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to