Re: [I] Compatibility issues with `org.apache.iceberg:iceberg-spark-runtime-3.5_2.13:1.5.0` [iceberg-rust]

via GitHub Fri, 03 May 2024 17:30:57 -0700


zeodtr commented on issue #338:
URL: https://github.com/apache/iceberg-rust/issues/338#issuecomment-2093913951


   @a-agmon Since the problem is in the schema, IMO checking the schema itself 
before reading the record is more appropriate. And since the error could be the 
other one (for example, a file read error), it cannot be assumed to be a schema 
mismatch error.
   And since avro's schema resolution mechanism is practically useless in the 
Iceberg, using `Reader::new()` would be helpful for the performance.
   BTW, the reading process itself is pretty slow (IMO). In my case, reading 
and processing an 8MB avro file took about 1.2sec on my server-class machine. I 
suspect that apache_avro crate has a problem that does string copy too many 
times. In the above case, it did string copy about 20 million times. I hope 
that this problem be resolved when reading-by-`field-id` is implemented.
   @Fokko Reading the fields by `field-id` is the way to go, but IMO it would 
be a big task that will replace a major part of apache_avro. Since it can take 
time, applying my workaround (described above) could be a practical (temporary) 
solution.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Compatibility issues with `org.apache.iceberg:iceberg-spark-runtime-3.5_2.13:1.5.0` [iceberg-rust]

Reply via email to