amogh-jahagirdar commented on issue #8729:
URL: https://github.com/apache/iceberg/issues/8729#issuecomment-1750162049

   >So this setting is not really "file size", it's more like "task size"?
   
   I would not say that. The docs I linked earlier put it concisely when it 
says "When writing data to Iceberg with Spark, it’s important to note that 
Spark cannot write a file larger than a Spark task and a file cannot span an 
Iceberg partition boundary. This means although Iceberg will always roll over a 
file when it grows to 
[write.target-file-size-bytes](https://iceberg.apache.org/docs/latest/configuration/#write-properties),
 but unless the Spark task is large enough that will not happen." 
   
   The property controls rolling over to a new file when the file is about to 
exceed the target size. So it does respect the target file size. If the Spark 
task is not large enough, then you won't see files hit the 
"write.target-file-size-bytes". To influence the task size you can see the 
write.distribution-mode properties in the docs (and if you're using 1.3.1 the 
default will be the hash based)
   
   >I just tried bumping up the value of this setting by times 10 (5368709120), 
and the file sizes are still around 100MB.
   
   Right, bumping it up won't magically make the files bigger, it depends on 
the task size which is determined by Spark.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to