ms1111 opened a new issue, #10180:
URL: https://github.com/apache/iceberg/issues/10180

   ### Feature Request / Improvement
   
   If the hadoop-common library is not present, trying to write a Parquet file:
   ```java
   DataWriter<Record> dataWriter =
           Parquet.writeData(file)
                   .schema(schema)
                   .createWriterFunc(GenericParquetWriter::buildWriter)
                   .overwrite()
                   .withSpec(partitionSpec)
                   .build();
   ```
   ... will fail with:
   ```
   Exception in thread "main" java.lang.NoClassDefFoundError: 
org/apache/hadoop/conf/Configuration
           at 
org.apache.iceberg.parquet.Parquet$WriteBuilder.<init>(Parquet.java:164)
           at 
org.apache.iceberg.parquet.Parquet$WriteBuilder.<init>(Parquet.java:143)
           at org.apache.iceberg.parquet.Parquet.write(Parquet.java:129)
           at 
org.apache.iceberg.parquet.Parquet$DataWriteBuilder.<init>(Parquet.java:646)
           at 
org.apache.iceberg.parquet.Parquet$DataWriteBuilder.<init>(Parquet.java:637)
           at org.apache.iceberg.parquet.Parquet.writeData(Parquet.java:623)
   ```
   
   In org.apache.iceberg.parquet.Parquet, an empty Configuration is created:
   ```java
       private WriteBuilder(OutputFile file) {
         this.file = file;
         if (file instanceof HadoopOutputFile) {
           this.conf = new Configuration(((HadoopOutputFile) file).getConf());
         } else {
           this.conf = new Configuration();
         }
       }
   ```
   
   ParquetWriter eventually passes this to ParquetIO.file(), which ignores it 
if the file is not a HadoopOutputFile.
   
   hadoop-common is a heavy dependency with many transitive dependencies, would 
be nice to avoid it.
   
   Similar to Iceberg Flink issues - #3117 / #4183 
   
   ### Query engine
   
   None


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to