szehon-ho commented on code in PR #7833:
URL: https://github.com/apache/iceberg/pull/7833#discussion_r1254943672
##########
data/src/main/java/org/apache/iceberg/data/DeleteFilter.java:
##########
@@ -275,17 +273,12 @@ private CloseableIterable<Record> openDeletes(DeleteFile
deleteFile, Schema dele
InputFile input = getInputFile(deleteFile.path().toString());
switch (deleteFile.format()) {
case AVRO:
- return Avro.read(input)
- .project(deleteSchema)
- .reuseContainers()
- .createReaderFunc(DataReader::create)
- .build();
+ return
Avro.read(input).project(deleteSchema).createReaderFunc(DataReader::create).build();
case PARQUET:
Parquet.ReadBuilder builder =
Parquet.read(input)
.project(deleteSchema)
- .reuseContainers()
Review Comment:
This is a great find.
I think it would be a reasonable alternative to fix GenericRecord.copy as
well, for ByteBuffers. (looks like its assuming primitives, which ByteBuffer
is not..).
Re: enabling reuseContainer(). Point taken that enabling, may be the
similar cost as the fixed copy(). But I am not familiar with all the cases
though, how about perf of nested columns (or would that also break due to
GenericRecord copy?) My observation being, we generally dont put
reuseContainers that much in the code.
I would like to see the opinion of @rdblue @RussellSpitzer as well, as I
think Ryan wrote the original GenericRecord.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]