Re: [PR] Dynamically support Spark native engine in Iceberg [iceberg]

via GitHub Wed, 21 Feb 2024 10:12:32 -0800


aokolnychyi commented on PR #9721:
URL: https://github.com/apache/iceberg/pull/9721#issuecomment-1957583785


   I am not sure I agree with the current proposal, given that it exposes lots 
of internal and evolving classes. Also, it may be too late to plugin another 
reader as Iceberg makes some assumptions on when vectorized reads can happen 
way earlier. This means even if the external library supports vectorized reads 
for nested data, we can't benefit from it because of the existing logic in 
`SparkBatch`.
   
   Have we considered allowing to inject a custom `PartitionReaderFactory`? We 
will pass `Table` and a delegate partition reader factory  to it. That way, 
external libraries will have more control over the logic but can delegate to 
the built-in reader factory as needed?
   
   I am not suggesting to switch right away but rather think about this option. 
Will it even work? Can external libs ship a custom partition reader factory 
assuming they have access to `Table` and built-in `PartitionReaderFactory` to 
delegate unsupported operations?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Dynamically support Spark native engine in Iceberg [iceberg]

Reply via email to