[I] why should we use file.createOrOverwrite to create a datafile or manifest file? [iceberg]

via GitHub Thu, 16 Nov 2023 23:08:00 -0800


chenwyi2 opened a new issue, #9100:
URL: https://github.com/apache/iceberg/issues/9100


   ### Feature Request / Improvement
   
   Recently i find that we use overwrite mode to create file for flink or 
spark,  for example: 
   ` switch (format) {
           case AVRO:
             return Avro.write(outputFile)
                 .createWriterFunc(ignore -> new FlinkAvroWriter(flinkSchema))
                 .setAll(props)
                 .schema(schema)
                 .metricsConfig(metricsConfig)
                 .overwrite()
                 .build();
   
           case ORC:
             return ORC.write(outputFile)
                 .createWriterFunc(
                     (iSchema, typDesc) -> 
FlinkOrcWriter.buildWriter(flinkSchema, iSchema))
                 .setAll(props)
                 .metricsConfig(metricsConfig)
                 .schema(schema)
                 .overwrite()
                 .build();
   
           case PARQUET:
             return Parquet.write(outputFile)
                 .createWriterFunc(msgType -> 
FlinkParquetWriters.buildWriter(flinkSchema, msgType))
                 .setAll(props)
                 .metricsConfig(metricsConfig)
                 .schema(schema)
                 .overwrite()
                 .build();`
   there  is a situation , when hdfs create requests has stuck or socket 
timeout, then hdfsclient will try to another node to create, and in this part, 
we successfully create, write and commit, but the previous request which has 
stuck is alive, and create again, and we use overwrite mode, then file will be 
overwrote as empty.
   so why should we use  overwrite mode?
   
   ### Query engine
   
   Flink


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[I] why should we use file.createOrOverwrite to create a datafile or manifest file? [iceberg]

Reply via email to