Re: [I] Spark ingestion with yarn cluster mode expecting the spec file to be available across all the node manager nodes [pinot]

via GitHub Fri, 03 Jan 2025 00:33:11 -0800


chrajeshbabu commented on issue #14527:
URL: https://github.com/apache/pinot/issues/14527#issuecomment-2568853246


   Making use of spark-submit to load the data start the 
LaunchDataIngestionJobCommand from any node because underlying YARN Application 
master APIs get used to start the command which can span the containers 
anywhere in the cluster.
   
   After going through the existing commands found that making use of 
LaunchSparkDataIngestionJobCommand doesn't requires to have spec file across 
all the nodes. Here is the example which would be better to document where ever 
spark is used for ingestion.
   
   `
   export SPARK_HOME=/opt/spark3
   /opt/apache-pinot-1.2.0-bin/bin/pinot-admin.sh LaunchSparkDataIngestionJob 
-jobSpecFile=/opt/apache-pinot-1.2.0-bin/bin/specs/jobspec.yaml -deployMode 
cluster -master yarn -pinotBaseDir=/opt/apache-pinot-1.2.0-bin/ -pluginsToLoad 
pinot-hdfs:pinot-batch-ingestion-spark-3  -sparkConf executor-memory=20GB
   `


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Re: [I] Spark ingestion with yarn cluster mode expecting the spec file to be available across all the node manager nodes [pinot]

Reply via email to