amogh-jahagirdar commented on issue #8729: URL: https://github.com/apache/iceberg/issues/8729#issuecomment-1750162049
>So this setting is not really "file size", it's more like "task size"? I would not say that. The docs I linked earlier put it concisely when it says "When writing data to Iceberg with Spark, it’s important to note that Spark cannot write a file larger than a Spark task and a file cannot span an Iceberg partition boundary. This means although Iceberg will always roll over a file when it grows to [write.target-file-size-bytes](https://iceberg.apache.org/docs/latest/configuration/#write-properties), but unless the Spark task is large enough that will not happen." The property controls rolling over to a new file when the file is about to exceed the target size. So it does respect the target file size. If the Spark task is not large enough, then you won't see files hit the "write.target-file-size-bytes". To influence the task size you can see the write.distribution-mode properties in the docs (and if you're using 1.3.1 the default will be the hash based) >I just tried bumping up the value of this setting by times 10 (5368709120), and the file sizes are still around 100MB. Right, bumping it up won't magically make the files bigger, it depends on the task size which is determined by Spark. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org