KKcorps opened a new pull request, #9288: URL: https://github.com/apache/pinot/pull/9288
The users currently need to create the whole spark-submit command to run a spark job for batch ingestion. With so many plugins available inside pinot leads a lot of classpath errors and you also need to take care of various arguments based on the environment in which you are running. This new command in `pinot-admin` aims to simply this for the users. e.g. Previously if you had to run ``` export PINOT_VERSION=0.11.0-SNAPSHOT export PINOT_DISTRIBUTION_DIR=/Users/kharekartik/Documents/Developer/pinot/build/ spark-submit --class org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand --master yarn --deploy-mode client --jars ${PINOT_DISTRIBUTION_DIR}/lib/pinot-all-0.11.0-SNAPSHOT-jar-with-dependencies.jar,${PINOT_DISTRIBUTION_DIR}/plugins/pinot-input-format/pinot-parquet/pinot-parquet-0.11.0-SNAPSHOT-shaded.jar,${PINOT_DISTRIBUTION_DIR}/plugins/pinot-file-system/pinot-s3/pinot-s3-0.11.0-SNAPSHOT-shaded.jar,${PINOT_DISTRIBUTION_DIR}/plugins-external/pinot-batch-ingestion/pinot-batch-ingestion-spark-3.2/pinot-batch-ingestion-spark-3.2-0.11.0-SNAPSHOT-shaded.jar local://${PINOT_DISTRIBUTION_DIR}/lib/pinot-all-0.11.0-SNAPSHOT-jar-with-dependencies.jar -jobSpecFile parquet_ingestion_spec_spark3_students.yml ``` You can now use ``` export SPARK_HOME=/usr/lib/spark/ bin/pinot-admin.sh LaunchSparkDataIngestionJob -jobSpecFile parquet_ingestion_spec_spark3_students.yml -pluginsToLoad pinot-parquet:pinot-s3 -master yarn ``` You can also mention any additional spark configurations using the `-sparkConf` option `-sparkConf spark.executor.cores=3:num-executors=4` Users can also specify jars directly from S3/GCS instead of local disk for environments like EMR `-pinotBaseDir s3://your-bucket/apache-pinot-0.11.0-SNAPSHOT` You can choose whether to run spark 2.x or 3.x with the following option (default is SPARK_3) `-sparkVersion SPARK_2` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org