GitHub user MisterRaindrop edited a comment on the discussion: [Proposal] 
Iceberg subsystem for datalake_fdw — design proposal

Thanks for the detailed feedback, @leborchuk — happy to dive into this topic 
with you.

  Personally, I see this as one of the inevitable directions for the next 
generation of data
  infrastructure. Open table formats — Iceberg, Lance, Hudi — are already 
emerging as a
  foundational layer, and storage–compute separation is, in my view, the 
architectural endpoint
  almost every serious analytics system is converging toward. We're also seeing 
these formats
  increasingly adopted as the data substrate for embodied-AI and multimodal 
workloads, which only
   reinforces the case for Cloudberry to be a first-class citizen here.

  Responding to your points one by one:

  1. RPC interface

  Yes — datalake_agent will expose a Protobuf + gRPC interface, treated as a 
stable, versioned
  contract so that the QD and the agent can evolve independently.

  2. Our primary motivation

  Our main motivation aligns with your scenario (1): cross-cluster data 
sharing, together with
  the storage–compute separation that the Iceberg architecture naturally 
enables. We're very
  optimistic about this direction overall.

  3. On scenario (2) — archive

  A genuine question back: if the end state is data sitting on object storage 
with Iceberg
  metadata, why not write directly to object storage from day one, rather than 
landing it in GP
  first and archiving later? That would collapse the archive case into the same 
code path as data
   sharing.

  4. Schema import / view-like tables

  Because we're going with the Table AM approach, every Iceberg table must have 
a corresponding
  relation in the catalog, so a CREATE TABLE is unavoidable — you will still 
need to create a
  table. That said, making the column set dynamic (tracking Iceberg schema 
evolution at read
  time) is entirely feasible and not particularly hard, and we plan to support 
it.

  5. Caching

  We do believe caching is effective. The first pull from remote storage is 
unavoidably slow, but
   once blocks are cached on local disk, reads are essentially 
indistinguishable from local
  files. On the cache side, prefetching and parallel download are both worth 
considering.

  The common reasons caching appears to underperform are, in our view:
  - cache capacity too small → low hit rate
  - network bottleneck during background fetch
  - cache block size too large → poor efficiency
  - insufficient concurrency → can't keep up with the consumer

  In principle, with proper sizing and tuning, a well-configured cache can 
reach near-local
  performance.

  6. Polaris

  Polaris is not a blocker. Cloudberry will manage (mirror) all Iceberg 
metadata internally;
  Polaris is only consulted at read time to fetch the latest Iceberg metadata 
pointer. Even if
  Polaris goes down, we can still read the Iceberg data.

  7. One caveat on performance expectations

  One thing worth flagging: we do want it fast, but the realistic baseline for 
comparison is GP
  itself, not a columnar engine. Cloudberry is a PG-based row engine — we 
return data row-by-row
  rather than in batches like a columnar engine — so it will naturally be 
somewhat slower than
  columnar systems. Our plan is to complete the functional surface first, and 
then optimize this
  axis as a dedicated follow-up.

  ---
  Zooming out: I think getting Iceberg right inside Cloudberry isn't just a 
feature — it's
  positioning the project for where the ecosystem is actually going (lakehouse 
+ open formats +
  AI-native workloads). Looking forward to keeping this conversation going, and 
very open to
  collaborating on scenario (1) with you in a production-like setting.

GitHub link: 
https://github.com/apache/cloudberry/discussions/1683#discussioncomment-16694368

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to