aokolnychyi commented on PR #9721: URL: https://github.com/apache/iceberg/pull/9721#issuecomment-1957583785
I am not sure I agree with the current proposal, given that it exposes lots of internal and evolving classes. Also, it may be too late to plugin another reader as Iceberg makes some assumptions on when vectorized reads can happen way earlier. This means even if the external library supports vectorized reads for nested data, we can't benefit from it because of the existing logic in `SparkBatch`. Have we considered allowing to inject a custom `PartitionReaderFactory`? We will pass `Table` and a delegate partition reader factory to it. That way, external libraries will have more control over the logic but can delegate to the built-in reader factory as needed? I am not suggesting to switch right away but rather think about this option. Will it even work? Can external libs ship a custom partition reader factory assuming they have access to `Table` and built-in `PartitionReaderFactory` to delegate unsupported operations? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org