zeodtr commented on issue #338: URL: https://github.com/apache/iceberg-rust/issues/338#issuecomment-2093913951
@a-agmon Since the problem is in the schema, IMO checking the schema itself before reading the record is more appropriate. And since the error could be the other one (for example, a file read error), it cannot be assumed to be a schema mismatch error. And since avro's schema resolution mechanism is practically useless in the Iceberg, using `Reader::new()` would be helpful for the performance. BTW, the reading process itself is pretty slow (IMO). In my case, reading and processing an 8MB avro file took about 1.2sec on my server-class machine. I suspect that apache_avro crate has a problem that does string copy too many times. In the above case, it did string copy about 20 million times. I hope that this problem be resolved when reading-by-`field-id` is implemented. @Fokko Reading the fields by `field-id` is the way to go, but IMO it would be a big task that will replace a major part of apache_avro. Since it can take time, applying my workaround (described above) could be a practical (temporary) solution. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org