Fokko commented on PR #8625: URL: https://github.com/apache/iceberg/pull/8625#issuecomment-1813465479
> do we need any changes in readers to benefit from this? If not, can we run some existing benchmarks to showcase the read improvement is as we anticipate? Since we use the decoders from Avro itself, we don't need any changes. The relevant code is here: https://github.com/apache/avro/blob/main/lang/java/avro/src/main/java/org/apache/avro/io/BinaryDecoder.java#L398-L424 It will speed up the reading tremendously when we don't need to read in the `map[int, bytes]` that we use to store statistics. This way you can jump right over them without having to skip each key-value individually. > Question. Aren't we using DataFileWriter from Avro in our AvroFileAppender? If so, how is this PR affecting it? Won't we still use direct encoders there? This is a good question. The goal of this PR is to write the block sizes for the manifests. @rustyconover any thoughts on this? > Also, nice work on a new encoder in Avro, @Fokko! Do you know when will that be available? Thanks! I can check in with the Avro community to see if we can do a release. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org