[I] append() fails with DataframeWriterV2's writeTo api [iceberg]

via GitHub Tue, 05 Mar 2024 14:48:36 -0800


mkavinashkumar opened a new issue, #9874:
URL: https://github.com/apache/iceberg/issues/9874


   ### Apache Iceberg version
   
   1.4.3 (latest release)
   
   ### Query engine
   
   Spark
   
   ### Please describe the bug 🐞
   
   I get an error when I try to append data using the writeTo api in pyspark 
with dataframes to append data to a both v2  and v1 iceberg tables. Is this an 
error in my setup or a known limitation?
   **Below is the code snippet:**
   ```
   from pyspark.sql.types import StructType, StructField, IntegerType, 
StringType
   from pyspark.sql import Row
   
   # Data
   data = [
       Row(1, "Software Engineer", "Engineering", 25000, "NA"),
       Row(2, "Director", "Sales", 22000, "EMEA")
   ]
   
   # Parallelize the data and create an RDD
   rdd = spark.sparkContext.parallelize(data)
   
   # Define the schema
   schema = StructType([
       StructField("id", IntegerType(), True),
       StructField("role", StringType(), True),
       StructField("department", StringType(), True),
       StructField("salary", IntegerType(), True),
       StructField("region", StringType(), True)
   ])
   
   # Create a DataFrame from RDD
   df = spark.createDataFrame(rdd, schema)
   df.writeTo("iceberg_test.employee_df_v2_2").tableProperty("format-version", 
"2").create()
   
   df_2 = spark.createDataFrame(rdd, schema)
   df.writeTo("iceberg_test.employee_df_v2_2").append()
   ```
   **Stacktrace:**
   ```
   Fail to execute line 25: df.writeTo("iceberg_test.employee_df_v2_2").append()
   Traceback (most recent call last):
     File 
"/mnt/volume8/yarn/nm/usercache/admin/appcache/application_1704921312430_0788/container_e03_1704921312430_0788_01_000001/tmp/python4571889082826602192/zeppelin_python.py",
 line 167, in <module>
       exec(code, _zcUserQueryNameSpace)
     File "<stdin>", line 25, in <module>
     File 
"/mnt/volume8/yarn/nm/usercache/admin/appcache/application_1704921312430_0788/container_e03_1704921312430_0788_01_000001/pyspark.zip/pyspark/sql/readwriter.py",
 line 2042, in append
       self._jwriter.append()
     File 
"/mnt/volume8/yarn/nm/usercache/admin/appcache/application_1704921312430_0788/container_e03_1704921312430_0788_01_000001/py4j-0.10.9.7-src.zip/py4j/java_gateway.py",
 line 1323, in __call__
       answer, self.gateway_client, self.target_id, self.name)
     File 
"/mnt/volume8/yarn/nm/usercache/admin/appcache/application_1704921312430_0788/container_e03_1704921312430_0788_01_000001/pyspark.zip/pyspark/errors/exceptions/captured.py",
 line 175, in deco
       raise converted from None
   pyspark.errors.exceptions.captured.AnalysisException: Cannot write into v1 
table: `spark_catalog`.`iceberg_test`.`employee_df_v2_2`.
   ```
   **Environment: spark 3.4.1, iceberg 1.4.3**
   spark.jars.packages:         
org.apache.iceberg:iceberg-spark-runtime-3.4_2.12:1.4.3,org.apache.iceberg:iceberg-spark-extensions-3.4_2.12:1.4.3
   spark.sql.catalog.spark_catalog:     
org.apache.iceberg.spark.SparkSessionCatalog    
   spark.sql.extensions:        
org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions       
   spark.sql.catalog.spark_catalog.type:        hive


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[I] append() fails with DataframeWriterV2's writeTo api [iceberg]

Reply via email to