sumeetgajjar opened a new issue, #5892:
URL: https://github.com/apache/iceberg/issues/5892

   ### Apache Iceberg version
   
   0.14.1 (latest release)
   
   ### Query engine
   
   _No response_
   
   ### Please describe the bug 🐞
   
   Hi,
   
   Iceberg does not respect the Avro properties i.e. 
`write.avro.compression-codec` and `write.avro.compression-level` from 
`TBLPROPERTIES` while writing Manifest and Manifest list files.
   
   This is because the table properties are not forwarded to Avro WriteBuilder:
   
https://github.com/apache/iceberg/blob/731e5f0aa27b374d7affd8d3512654b2212048dc/core/src/main/java/org/apache/iceberg/ManifestWriter.java#L293-L301
   
   Thus the `Context` defaults to `TableProperties#AVRO_COMPRESSION_DEFAULT` 
i.e `gzip`
   
https://github.com/apache/iceberg/blob/731e5f0aa27b374d7affd8d3512654b2212048dc/core/src/main/java/org/apache/iceberg/avro/Avro.java#L207-L211
   
   
   ### Steps to reproduce
   ```scala
   scala> sql(" CREATE TABLE tpcds_1_tb_iceberg.manifest_compression (a INT) 
USING iceberg TBLPROPERTIES ('write.avro.compression-codec'='zstd')")
   res0: org.apache.spark.sql.DataFrame = []
   
   scala> 
spark.range(10).toDF("a").coalesce(1).writeTo("tpcds_1_tb_iceberg.manifest_compression").append()
   
   scala>
   ```
   
   ```bash
   bash-5.1$ avro-tools getmeta 
iceberg_warehouse/tpcds_1_tb_iceberg/manifest_compression/metadata/snap-3374754284586474934-1-ac1d7acb-bbe0-484c-b4b2-4e4891a100a3.avro
 | grep -i avro.codec
   22/09/29 16:59:27 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
   avro.codec      deflate
   bash-5.1$
   ```
   
   Even though we set the compression to zstd, the underlying Avro file is 
compressed using Gzip.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to