Re: [PR] Core: Avro writers use BlockingBinaryEncoder to enable array/map size calculations. [iceberg]

via GitHub Wed, 15 Nov 2023 15:38:24 -0800


Fokko commented on PR #8625:
URL: https://github.com/apache/iceberg/pull/8625#issuecomment-1813465479

> do we need any changes in readers to benefit from this? If not, can we run
some existing benchmarks to showcase the read improvement is as we anticipate?

Since we use the decoders from Avro itself, we don't need any changes. The
relevant code is here:
https://github.com/apache/avro/blob/main/lang/java/avro/src/main/java/org/apache/avro/io/BinaryDecoder.java#L398-L424

It will speed up the reading tremendously when we don't need to read in the
`map[int, bytes]` that we use to store statistics. This way you can jump right
over them without having to skip each key-value individually.

> Question. Aren't we using DataFileWriter from Avro in our
AvroFileAppender? If so, how is this PR affecting it? Won't we still use direct
encoders there?

This is a good question. The goal of this PR is to write the block sizes for
the manifests. @rustyconover any thoughts on this?

> Also, nice work on a new encoder in Avro, @Fokko! Do you know when will
that be available?

Thanks! I can check in with the Avro community to see if we can do a release.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Core: Avro writers use BlockingBinaryEncoder to enable array/map size calculations. [iceberg]

Reply via email to