support-cloud opened a new issue, #9824: URL: https://github.com/apache/iceberg/issues/9824
### Query engine Spark job submitted to yarn from remote jupyter notebook pod Spark version 3.4.1 Iceberg version 1.4.3 hive version 3.1.0 ### Question Hi we are trying to read iceberg hive tables using Apache Spark from jupyter notebook pod built on kubernetes. Spark is configured on yarn externally and we are trying to read iceberg hive tables but the job shows Failed when viewed from Yarn application logs. The spark code we tried is as follows import os import pyspark from pyspark.sql import SparkSession from pyspark.sql.functions import udf from pyspark.sql.types import FloatType,IntegerType,StructType,StructField from pyspark.sql import functions as f from pyspark.sql import Window # Session configuration spark = (SparkSession.builder.master("yarn").appName("iceberg_test") .config("spark.jars.packages", "org.apache.iceberg:iceberg-spark-runtime-3.4_2.12:1.4.3") .config("spark.jars", "/usr/hdp/3.1.4.0-315/spark3/jars/iceberg-spark-runtime-3.4_2.12-1.4.3.jar, /usr/hdp/3.1.4.0-315/hive/lib/iceberg-hive-runtime-1.4.3.jar, /usr/hdp/3.1.4.0-315/spark3/jars/hive-serde-2.3.9.jar") .config("spark.sql.catalog.spark_catalog.type", "hive") .config("spark.sql.catalog.spark_catalog", "org.apache.iceberg.spark.SparkSessionCatalog") .config("spark.sql.catalog.local", "org.apache.iceberg.spark.SparkCatalog") .config("spark.sql.catalog.local.type", "hadoop") .config("spark.sql.catalog.local.warehouse", "$PWD/warehouse") .config("iceberg.hive.engine.enabled", "true") .enableHiveSupport() .getOrCreate() ) test_df = spark.sql("select * from icebergdb.default") test_df.show() Spark job is submitted with master as "yarn", so i have kept the iceberg-runtime relevant jar in the Yarn cluster and in the code i have called the jar situated in the remote yarn cluster and not the jupyter pod. Is this method correct or am i missing any points to be taken into consideration? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org