Re: [D] [Proposal] Iceberg subsystem for datalake_fdw — design proposal [cloudberry]

via GitHub Thu, 23 Apr 2026 23:27:30 -0700


GitHub user ostinru added a comment to the discussion: [Proposal] Iceberg 
subsystem for datalake_fdw — design proposal


> Schema import / view-like tables
> Because we're going with the Table AM approach, every Iceberg table must have 
> a corresponding
> relation in the catalog, so a CREATE TABLE is unavoidable — you will still 
> need to create a
> table. That said, making the column set dynamic (tracking Iceberg schema 
> evolution at read
> time) is entirely feasible and not particularly hard, and we plan to support 
> it.

When I was researching how to make PXF's `CRETE FOREIGN TABLE` easier to use I 
ended up with an idea of `IMPORT FOREIGN SCHEMA` as first step + and background 
worker that refreshes state. https://github.com/apache/cloudberry-pxf/issues/69
However I am not sure that FOREIGN TABLE is the best solution from UX point of 
view. Multi-catalog approach sounds like better solution.

> Caching
> We do believe caching is effective. The first pull from remote storage is 
> unavoidably slow, but
> once blocks are cached on local disk, reads are essentially indistinguishable 
> from local
> files. On the cache side, prefetching and parallel download are both worth 
> considering.

@leborchuk , I read (somewhere) following approach for caching - cache 
Parquet/ORC's file footers - this should eliminate extra roundtrip and can help 
storage engine to skip files without fetching them.

GitHub link: 
https://github.com/apache/cloudberry/discussions/1683#discussioncomment-16697456

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [D] [Proposal] Iceberg subsystem for datalake_fdw — design proposal [cloudberry]

Reply via email to