hussein-awala commented on PR #9902: URL: https://github.com/apache/iceberg/pull/9902#issuecomment-1987003661
@amogh-jahagirdar I tried to use spark to write the data (check https://github.com/apache/iceberg/pull/9902/commits/25815479628015143551c2379be5608e2dd09bd7), but I had an issue with the used metastore (Hive thrift): ``` Serialization stack: - object not serializable (class: org.apache.hadoop.fs.Path, value: file:/var/folders/1q/drsr0xqn0mzf03hhhf1z67_40000gn/T/hive12824930955088255332/test/.hive-staging_hive_2024-03-09_23-55-13_177_3570944359873831448-1/-ext-10000) - field (class: org.apache.hadoop.hive.ql.plan.FileSinkDesc, name: dirName, type: class org.apache.hadoop.fs.Path) - object (class org.apache.hadoop.hive.ql.plan.FileSinkDesc, org.apache.hadoop.hive.ql.plan.FileSinkDesc@826a408) - field (class: org.apache.spark.sql.hive.execution.HiveFileFormat$$anon$1, name: fileSinkConfSer$1, type: class org.apache.hadoop.hive.ql.plan.FileSinkDesc) - object (class org.apache.spark.sql.hive.execution.HiveFileFormat$$anon$1, org.apache.spark.sql.hive.execution.HiveFileFormat$$anon$1@2d533ec6) - field (class: org.apache.spark.sql.execution.datasources.WriteJobDescription, name: outputWriterFactory, type: class org.apache.spark.sql.execution.datasources.OutputWriterFactory) - object (class org.apache.spark.sql.execution.datasources.WriteJobDescription, org.apache.spark.sql.execution.datasources.WriteJobDescription@260edd7e) - element of array (index: 0) - array (class [Ljava.lang.Object;, size 4) - field (class: java.lang.invoke.SerializedLambda, name: capturedArgs, type: class [Ljava.lang.Object;) - object (class java.lang.invoke.SerializedLambda, SerializedLambda[capturingClass=class org.apache.spark.sql.execution.datasources.WriteFilesExec, functionalInterfaceMethod=scala/Function1.apply:(Ljava/lang/Object;)Ljava/lang/Object;, implementation=invokeStatic org/apache/spark/sql/execution/datasources/WriteFilesExec.$anonfun$doExecuteWrite$1:(Lorg/apache/spark/sql/execution/datasources/WriteJobDescription;Ljava/lang/String;Lorg/apache/spark/internal/io/FileCommitProtocol;Lscala/Option;Lscala/collection/Iterator;)Lscala/collection/Iterator;, instantiatedMethodType=(Lscala/collection/Iterator;)Lscala/collection/Iterator;, numCaptured=4]) - writeReplace data (class: java.lang.invoke.SerializedLambda) - object (class org.apache.spark.sql.execution.datasources.WriteFilesExec$$Lambda$2347/0x000000e001f3c5f8, org.apache.spark.sql.execution.datasources.WriteFilesExec$$Lambda$2347/0x000000e001f3c5f8@34bb023a) - element of array (index: 0) - array (class [Ljava.lang.Object;, size 1) - field (class: java.lang.invoke.SerializedLambda, name: capturedArgs, type: class [Ljava.lang.Object;) - object (class java.lang.invoke.SerializedLambda, SerializedLambda[capturingClass=class org.apache.spark.rdd.RDD, functionalInterfaceMethod=scala/Function3.apply:(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;, implementation=invokeStatic org/apache/spark/rdd/RDD.$anonfun$mapPartitionsInternal$2$adapted:(Lscala/Function1;Lorg/apache/spark/TaskContext;Ljava/lang/Object;Lscala/collection/Iterator;)Lscala/collection/Iterator;, instantiatedMethodType=(Lorg/apache/spark/TaskContext;Ljava/lang/Object;Lscala/collection/Iterator;)Lscala/collection/Iterator;, numCaptured=1]) - writeReplace data (class: java.lang.invoke.SerializedLambda) - object (class org.apache.spark.rdd.RDD$$Lambda$2349/0x000000e001f3cca8, org.apache.spark.rdd.RDD$$Lambda$2349/0x000000e001f3cca8@68ee800f) - field (class: org.apache.spark.rdd.MapPartitionsRDD, name: f, type: interface scala.Function3) - object (class org.apache.spark.rdd.MapPartitionsRDD, MapPartitionsRDD[3] at create at TestSparkReaderWithBloomFilter.java:210) - field (class: scala.Tuple2, name: _1, type: class java.lang.Object) - object (class scala.Tuple2, (MapPartitionsRDD[3] at create at TestSparkReaderWithBloomFilter.java:210,org.apache.spark.sql.execution.datasources.FileFormatWriter$$$Lambda$2354/0x000000e001f44d28@6ff879e5)) ``` Do you have any idea? I tried to remove the metastore and use `IcebergSparkSessionExtensions` as SQL extension, but the class is not available in the module, I can check how to fix it if you prefer this solution since it's the one explained in Iceberg documentation for Spark users. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org