[I] Cannot write nullable values to non-null column in the Iceberg Table [iceberg]

via GitHub Tue, 16 Jan 2024 12:42:08 -0800


abharath9 opened a new issue, #9488:
URL: https://github.com/apache/iceberg/issues/9488


   Throwing following error when trying into insert data into the Iceberg table 
with not-null columns constraints.
   **_Cannot write nullable values to non-null column 'id' exception_** 
   
   Here is a sample code to reproduce this issue. 
   `from pyspark.conf import SparkConf
   from pyspark.context import SparkContext
   from pyspark.sql import SparkSession
   from pyspark.sql import SparkSession
   from pyspark.sql.functions import col, concat, when, countDistinct, lit, size
   from pyspark.sql.types import StringType
   from pyspark.sql.types import StructType, StructField, StringType, 
IntegerType, ArrayType
   
   
   conf = ( 
       SparkConf()
       .setAppName("analyst-hero-spark-write-iceberg")
       .set("spark.sql.adaptive.coalescePartitions.enabled","true") 
       .set("spark.sql.parquet.filterPushdown","true")
       .set("spark.hadoop.fs.s3a.impl.disable.cache","false")
       .set("spark.sql.catalog.spark_catalog", 
'org.apache.iceberg.spark.SparkSessionCatalog')
       .set("spark.sql.extensions", 
'org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions')
       .set("spark.hadoop.iceberg.hive.lock-timeout-ms", '180')
       .set("spark.hadoop.iceberg.delete-files-on-table-drop", 'true')
       .set("spark.hadoop.iceberg.mr.commit.table.thread.pool.size", '1')
       .set("spark.hadoop.iceberg.mr.commit.file.thread.pool.size", '1')
       .set("spark.hadoop.iceberg.mr.commit.retry.num-retries", '300')
       .set("spark.sql.iceberg.check-nullability", 'false')
   )
   
   
   spark = 
SparkSession.builder.enableHiveSupport().config(conf=conf).getOrCreate()
   
   spark.sql("CREATE TABLE IF NOT EXISTS 
sandbox.iceberg_table_mandatory_column_test (id string not null,name 
string,nationality string) USING iceberg")
   
   spark.sql("show create table 
sandbox.iceberg_table_mandatory_column_test").select("createtab_stmt").show(truncate=False)
   
   from pyspark.sql import Row
   flatData = spark.createDataFrame([Row(id='100', name='bob', 
nationality='welsh'),Row(id='200', name='john', nationality='british')])
   
   flatData.show(truncate=False)`
   
   Tried to insert the data using sql-insert into and also using Dataframe API 
insertInto both are returning **_Cannot write nullable values to non-null 
column 'id' exception_**
   
   `flatData.createOrReplaceTempView("test_data")
   spark.sql("insert into sandbox.iceberg_table_mandatory_column_test  select * 
from test_data")
   
   or
   
   flatData.write.insertInto("sandbox.iceberg_table_mandatory_column_test")`
   
   Complete stack trace in the attachment.
   
   
[error-iceberg-test.txt](https://github.com/apache/iceberg/files/13955530/error-iceberg-test.txt)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[I] Cannot write nullable values to non-null column in the Iceberg Table [iceberg]

Reply via email to