GitHub user ostinru added a comment to the discussion: [Proposal] Iceberg subsystem for datalake_fdw — design proposal
> Schema import / view-like tables > Because we're going with the Table AM approach, every Iceberg table must have > a corresponding > relation in the catalog, so a CREATE TABLE is unavoidable — you will still > need to create a > table. That said, making the column set dynamic (tracking Iceberg schema > evolution at read > time) is entirely feasible and not particularly hard, and we plan to support > it. When I was researching how to make PXF's `CRETE FOREIGN TABLE` easier to use I ended up with an idea of `IMPORT FOREIGN SCHEMA` as first step + and background worker that refreshes state. https://github.com/apache/cloudberry-pxf/issues/69 However I am not sure that FOREIGN TABLE is the best solution from UX point of view. Multi-catalog approach sounds like better solution. > Caching > We do believe caching is effective. The first pull from remote storage is > unavoidably slow, but > once blocks are cached on local disk, reads are essentially indistinguishable > from local > files. On the cache side, prefetching and parallel download are both worth > considering. @leborchuk , I read (somewhere) following approach for caching - cache Parquet/ORC's file footers - this should eliminate extra roundtrip and can help storage engine to skip files without fetching them. GitHub link: https://github.com/apache/cloudberry/discussions/1683#discussioncomment-16697456 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
