Re: [I] write.target-file-size-bytes isn't respected when writing data [iceberg]

via GitHub Fri, 06 Oct 2023 01:07:56 -0700


amogh-jahagirdar commented on issue #8729:
URL: https://github.com/apache/iceberg/issues/8729#issuecomment-1750162049

>So this setting is not really "file size", it's more like "task size"?

I would not say that. The docs I linked earlier put it concisely when it
says "When writing data to Iceberg with Spark, it’s important to note that
Spark cannot write a file larger than a Spark task and a file cannot span an
Iceberg partition boundary. This means although Iceberg will always roll over a
file when it grows to
[write.target-file-size-bytes](https://iceberg.apache.org/docs/latest/configuration/#write-properties),
but unless the Spark task is large enough that will not happen."

The property controls rolling over to a new file when the file is about to
exceed the target size. So it does respect the target file size. If the Spark
task is not large enough, then you won't see files hit the
"write.target-file-size-bytes". To influence the task size you can see the
write.distribution-mode properties in the docs (and if you're using 1.3.1 the
default will be the hash based)

>I just tried bumping up the value of this setting by times 10 (5368709120),
and the file sizes are still around 100MB.

Right, bumping it up won't magically make the files bigger, it depends on
the task size which is determined by Spark.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] write.target-file-size-bytes isn't respected when writing data [iceberg]

Reply via email to