GitHub user yjhjstz edited a comment on the discussion: [Proposal] Iceberg
subsystem for datalake_fdw — design proposal
The Java agent concern goes beyond "one extra hop". Fragment planning
(`/fragments`) is called on **every SELECT**. At scale — millions of files,
concurrent workloads — the gRPC + JVM deserialization cost shows up in query
latency directly. More fundamentally, the JVM process model conflicts with
PostgreSQL's fork/signal/crash-recovery model: memory limits are split across
two runtimes (resource groups can't unify them), GC pauses can cause gRPC
timeouts on the QD, and the agent restart window leaves all Iceberg tables
temporarily unwritable. These are permanent architectural constraints, not
things we can optimize away later.
| | Java agent | iceberg-cpp |
|--|--|--|
| Short-term delivery | Fast (iceberg-java ready to use) | Slower (gaps to
fill) |
| Long-term operational cost | High | Low |
| Query performance ceiling | Bounded by gRPC + JVM | No extra overhead |
| Architectural consistency | Poor fit for a C/C++ database | Native |
| Format compatibility risk | Very low | Medium (must track spec carefully) |
| Reversibility | Nearly impossible once shipped | Continuously evolvable |
The right answer is **[Apache
iceberg-cpp](https://github.com/apache/iceberg-cpp)**. The gaps (CAS commit,
catalog backends, snapshot/manifest writing) are well-specified engineering
work — one-time investment. The Java agent's architectural debt is paid forever.
StarRocks and Doris — currently the strongest Iceberg MPP readers — are pure
C++, no Java metadata sidecar. The TPC-H numbers shared above already show
Cloudberry behind. Adding a Java
agent makes catching up harder, not easier.
Cloudberry is an Apache incubating project. Co-investing in Apache iceberg-cpp
is a better community story and a better technical foundation than wrapping
iceberg-java behind gRPC. I'd strongly advocate for **not shipping a Java agent
as part of Cloudberry core**, and instead contributing the missing pieces
upstream to Apache iceberg-cpp together.
GitHub link:
https://github.com/apache/cloudberry/discussions/1683#discussioncomment-16856371
----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]