GitHub user MisterRaindrop edited a discussion: [Proposal] Iceberg subsystem
for datalake_fdw — design proposal
### Proposers
@MisterRaindrop
### Proposal Status
Under Discussion
### Abstract
## 1. Abstract
Cloudberry does not have a complete set of plug-in tools for accessing various
data sources.
I plan to design a data lake approach to access these data sources, and evolve
Cloudberry toward a data lake–enabled architecture.
`datalake_fdw` extends Cloudberry with two complementary ways of accessing
data-lake storage:
1. **FDW foreign-table read / append** : direct read / append of Parquet / ORC
/ Avro / Text / CSV files on S3 / HDFS / OSS.
2. **Native Iceberg tables** (added by this design): `CREATE ICEBERG TABLE`
inside CB to create and manage Apache Iceberg tables with full **SELECT /
INSERT / UPDATE / DELETE / VACUUM**, Schema Evolution, and snapshot-based Read
Committed isolation.
This document focuses on the second part — the design, the key decisions, and
the open questions — and is meant for community review.
### Motivation
## 2. Motivation & Goals
### 2.1 Why we need this
As an MPP data warehouse, Cloudberry has long lacked a **transactional read /
write** entry point for data-lake formats, Iceberg in particular:
- The existing PXF-based FDW foreign tables are limited in capability;
- There is no Catalog concept, so CB cannot share metadata with the wider
Iceberg ecosystem (Spark / Trino / Flink);
- There is no snapshot isolation or ACID, which makes lakehouse scenarios
diverge from CB's native-table semantics.
The Iceberg subsystem aims to introduce Iceberg tables as **first-class "lake
tables"** in CB without breaking PostgreSQL / Cloudberry transactional
semantics:
- Same SQL entry point as native tables (`CREATE ICEBERG TABLE ...`, `INSERT`,
`UPDATE`, `DELETE`, `VACUUM`);
- Metadata format fully compatible with the Iceberg community — snapshots
written by CB must be directly readable by Spark / Trino;
- Write path is MPP-parallel; segments talk to object storage directly;
- Transactional semantics aligned with PG: a single transaction is either fully
visible or fully rolled back, and `SAVEPOINT` is supported.
### 2.2 Goals
The first release of this design aims to deliver:
- **Catalog support**: Polaris / Hive Metastore / Builtin (CB-internal);
- **Storage support**: S3 (including MinIO / OSS) and HDFS (including HA +
Kerberos);
- Read Committed isolation with concurrent commits;
- Reuse of the Iceberg community Java implementation to keep metadata-semantics
maintenance cost low;
- MPP-parallel file-level execution; QEs write directly to object storage.
### 2.3 Non-goals (outside the first release)
- No explicit Serializable isolation;
- No partition-spec evolution; bucket / truncate / hour transforms not
supported;
- No Branch / Tag / Time Travel queries;
- Does not replace the FDW raw-file path — both paths coexist.
### Implementation
## 3. Overall Architecture
The proposed design has four layers, split into a **metadata path** and a
**data path**:
```
┌─────────────────────────────────────────────────────────────┐
│ SQL: CREATE / SELECT / INSERT / UPDATE / DELETE / VACUUM │
└─────────────────────────────────────────────────────────────┘
│
┌─────────────────────────────────────────────────────────────┐
│ Iceberg Table AM │
│ (makes Iceberg tables look like ordinary tables; │
│ tableam callbacks + transaction Tracker) │
└─────────────────────────────────────────────────────────────┘
│ │
│ metadata │ data
▼ ▼
┌──────────────────────────┐ ┌──────────────────────────┐
│ Catalog FDW │ │ Volume FDW │
│ Polaris / Hive / │ │ S3 / HDFS │
│ Builtin (CB sys tbl) │ │ │
└──────────────────────────┘ └──────────────────────────┘
│ │
gRPC │
▼ ▼
┌──────────────────────────┐ ┌──────────────────────────┐
│ datalake_agent (jar) │ │ Provider (C++) │
│ Java / iceberg-java │ │ Parquet reader/writer │
│ ↑ launched by │ │ position/eq delete │
│ datalake_proxy │ │ │
│ bgworker │ │ │
└──────────────────────────┘ └──────────────────────────┘
│ │
▼ ▼
Catalog service / HMS Object store / HDFS
```
- All **metadata** operations (CREATE TABLE, plan files, commit snapshot,
VACUUM rewrite) go through the agent and are handled by iceberg-java;
- All **data** operations (Parquet / position-delete file read / write) go
through FDW → Provider; segments talk to storage directly, bypassing the agent;
- `datalake_agent` is a Java jar; it is launched and supervised by the PG
bgworker `datalake_proxy` at postmaster startup (see §5.4);
- The RPC channel between PG and the agent is gRPC.
## 4. The Core Abstraction: Catalog × Volume × Table
The design splits an Iceberg table into three independently configurable,
freely composable pieces:
| Abstraction | Responsibility | Supported |
|-------------|----------------|-----------|
| **Catalog** | Iceberg metadata directory: namespace / table listing,
`metadata.json` location, schema-evolution history | Polaris REST / Hive
Metastore / Builtin |
| **Volume** | Where data files physically live: data files / delete files /
manifest / metadata json | S3 (incl. MinIO / OSS) / HDFS (incl. HA + Kerberos) |
| **Table** | The above two + column definitions + partition keys + table
options | —— |
A Volume can be shared by multiple tables (different paths under the same
bucket); a Catalog can reference multiple Volumes (different tables on
different storage). Polaris is a special case — the storage configuration is
dispatched by the Polaris service, so a user-side Volume is optional.
### Why Catalog and Volume are separated
In real deployments they are **orthogonal**:
- Some users already run a Hive Metastore and want data files on S3;
- Some users use Polaris as the catalog but keep two buckets — hot and cold —
for different tables;
- Some users have no external catalog at all, only object storage.
Making Catalog and Volume two separate FDWs, each with its own Server /
UserMapping, lets us cover every combination without inventing a new FDW for
each.
### Builtin Catalog
For users with no external Catalog (Polaris / Hive) available, the design
offers a **Builtin** option: the `metadata.json` location is stored directly in
a CB system table. Data files still live on the Volume, and other engines can
open the table through Iceberg's HadoopCatalog / FileIO using that path.
**Why we need it**: it removes the hard dependency on a Catalog service and
lowers the barrier to entry. It also gives a zero-dependency option for the "CB
is the only writer" single-writer scenario.
## 5. Components & Design Decisions
The following lists the design choice — and the reason behind it — for each key
component.
### 5.1 Iceberg Table AM: why not a pure FDW
The most direct approach would be to keep using FDW, but two hard limitations
get in the way:
- PG's FDW has limited support for UPDATE / DELETE, which does not fit
Iceberg's requirement for full DML;
- FDW foreign tables are treated as second-class citizens in many parts of the
planner / analyzer / resource group.
Table AM (`TableAmRoutine`), introduced in PG 14, is a first-class storage
abstraction: from the SQL side an Iceberg table looks like an ordinary table,
and UPDATE / DELETE / `ctid` semantics, transactional callbacks, and ANALYZE
all come for free from the kernel.
The proposed approach is therefore: register Iceberg tables as a dedicated
Table AM, and **have the AM delegate data I/O to the Volume FDW internally**
(reusing the existing S3 / HDFS read / write code). We get the SQL consistency
of tableam and avoid reimplementing the storage layer.
The core code will live in `src/am_iceberg/`. The AM handler itself is very
thin; the main logic is planned to be organized as follows:
- `pg_iceberg_ddl.c` — `OAT_POST_CREATE / OAT_DROP` hook; creates/drops Iceberg
tables via the Catalog on DDL;
- `pg_iceberg_catalog.c` — unified wrapper for all Catalog calls;
- `pg_iceberg_metadata.c` — manages the `iceberg.pg_iceberg_metadata` system
table;
- `pg_iceberg_metadata_tracker.c` — transaction-scoped metadata tracker (see
§5.6);
- `pg_iceberg_rewrite_plan.c` — QD ↔ QE JSON contract for VACUUM compaction.
### 5.2 Catalog FDW: abstracting three backends
`iceberg_catalog_fdw` abstracts metadata operations into a set of
`IcebergCatalogOperation`s (create_table / load_table / drop_table / append /
update / delete / get_fragment / get_statistics / plan_file_groups / commit_*
and so on).
The Server's `type` option decides the backend:
| `type` | Backend |
|--------|---------|
| `polaris` | Polaris REST Catalog |
| `hive` | Hive Metastore (Kerberos supported) |
| unset | Builtin (CB system table) |
Upwards, the AM only sees `pg_iceberg_*_with_catalog()` functions. Downwards,
`agent_cli` talks to the agent over RPC (Builtin is the exception — it
short-circuits to the CB system table).
**Why FDW instead of a plain C function**: it lets us reuse PG's `CREATE SERVER
/ USER MAPPING` for credentials and permissions, and unifies the configuration
entry point across multiple Catalog types.
### 5.3 Volume FDW: the data-file I/O abstraction
`iceberg_volume_fdw` is planned to handle the actual read / write of **data
files and delete files** (manifest / metadata json are managed by the Catalog
side). It implements the full FDW interface: `GetForeignRelSize /
GetForeignPaths / BeginForeignScan / BeginForeignModify / ...`.
Its responsibilities:
1. Take the fragment list from the Catalog FDW (passed into `fdw_private` at
plan time);
2. On the QE side, filter out the fragments assigned to this segment by
`segindex`;
3. Call the Provider to read / write Parquet;
4. After writing, serialize the file metadata produced by this QE (path /
record_count / size / partition / whether it is a position-delete) into JSON
and return it to the QD.
The Server's `type` option decides storage: `s3` / `s3b` (OSS / MinIO / OBS /
…) / `hdfs`.
### 5.4 datalake_agent: why a separate Java service
This is the single most important design trade-off.
Iceberg's metadata semantics are complex: manifest lists, snapshot logs,
partition-spec evolution, schema field-id mapping, optimistic CAS commit, and
so on. The community's most invested, most mature implementation is
`iceberg-java`.
Reimplementing all of this on the C / C++ side would cost us:
- A large initial implementation;
- Repeated effort on every Iceberg version upgrade;
- Format-compatibility risk (some defaults are hard-coded in the reference
implementation and not fully documented).
Therefore the design delegates all metadata operations to a dedicated
`datalake_agent` (Java Spring Boot, wrapping iceberg-java + hive-jdbc +
hadoop-client). The interface is planned to cover:
`/iceberg/tables` — create / load / drop;
`/fragments` — plan files (with predicate pushdown);
`/modify` — incremental snapshot generation;
`/commit` — CAS commit;
`/plan-rewrite` + `/commit-rewrite` — VACUUM.
**Upside**:
- Compatibility: snapshots written by CB are byte-identical to the community
format;
- Easy upgrades: picking up a new Iceberg version is just a jar swap on the
agent;
- Stateless: every request carries the full configuration, making horizontal
scaling easy.
**Cost**: one extra network hop — but **only on the metadata path**; data I/O
still goes straight from C++ to storage, so throughput is unaffected.
#### Process lifecycle: managed by the `datalake_proxy` bgworker
To tie the agent's lifecycle to the CB cluster and spare users from babysitting
a Java process, the design introduces `datalake_proxy`
(`contrib/datalake_proxy/`), a PG background worker (bgworker):
- `datalake_proxy` is registered in `shared_preload_libraries` and starts with
the postmaster;
- In `_PG_init`, a bgworker is registered that `fork`s a child process to run
the agent jar on startup;
- If the child crashes, `datalake_proxy` restarts it;
- The GUC `datalake_proxy.register_datalake_proxy` toggles the feature;
`datalake_proxy.dlagent_memory_limit` (default 2 GB) caps the agent's JVM heap;
- When the postmaster exits, the signal propagates through `datalake_proxy` to
the agent for a clean shutdown.
>From the user's perspective this means "CB is up → Iceberg is available" — no
>extra deployment, no extra supervisor.
#### RPC protocol: gRPC
JSON / REST has two pain points at scale — large fragment lists cost CPU to
encode / decode, and plan-file results are slow to deserialize when they get
big. The plan is to expose the same interface over protobuf + gRPC:
- Bidirectional streaming interfaces (e.g. `get_fragments` can be
server-streamed) reduce QD memory pressure;
- protobuf saves bandwidth and CPU;
- gRPC's built-in health checks / load balancing pave the way for a
multi-instance agent deployment in the future.
### 5.5 Provider layer: the data plane
`src/provider/iceberg/` is planned to be a C++ implementation covering
Iceberg's data plane:
- Parquet / ORC row readers and writers;
- Position-delete file I/O (schema = `file_path string, pos long`);
- Delete-index construction (data file → deleted-positions bitmap);
- Equality-delete read (read-only for now);
- Translation from Iceberg `FileScanTask` into a row reader.
**Why Provider does not go through the agent**: data I/O is the system's
throughput bottleneck. Only by having each segment read / write storage
independently and in parallel can we sustain MPP-scale writes. Meanwhile,
mature C++ libraries already exist for Parquet (arrow-cpp / orc) — reusing them
is far more efficient than routing through an agent.
### 5.6 Metadata Tracker: the heart of transactional semantics
**The problem**: Iceberg uses optimistic CAS (via the metadata.json version
chain) for concurrency, while PG uses MVCC. How do we fit Iceberg's snapshot
semantics inside a PG transaction?
**The design**: a transaction-scoped `Metadata Tracker`. Its shape is inspired
by Rust iceberg-rs's `MetadataLocationTracker` and pg_lake's
`IcebergSnapshotBuilder`.
Under this design, modifications to an Iceberg table within a transaction flow
as follows:
```
BEGIN
│
├─ First access to t: read current metadata_location from Catalog
│ as initial_base
│
├─ DML-1 ──→ QE writes data files
│ QD calls agent /modify to produce an "intermediate"
│ metadata.json (NOT committed to Catalog)
│ tracker records: current_metadata, accumulated data_files
│
├─ DML-2 ──→ Read latest metadata from Catalog (rebase check)
│ If it changed (someone else committed), re-plan against it
│ Produce a new "intermediate" metadata
│
├─ SELECT ─→ Uses tracker.current (Read-Your-Own-Writes)
│ For already-modified tables, triggers one more rebase
│ (to see concurrent commits)
│
├─ SAVEPOINT / ROLLBACK TO ─→ stack-style restore of accumulated files
│
COMMIT
│
└─ tracker_commit_all(): per modified table, CAS to Catalog
On conflict, rebase and retry; up to 10 retries then PG-level abort
```
**Three rebase trigger points**:
| Scenario | When | Purpose |
|----------|------|---------|
| per-statement | End of each DML | Read-Your-Own-Writes + early
concurrent-conflict detection |
| at-scan | SELECT on an already-modified table | Let SELECT see concurrent
committed data |
| at-commit | PRE_COMMIT | Final merge, reduces CAS failure probability |
The resulting semantics:
- **Read Committed**: every statement sees committed concurrent transactions;
- **Read-Your-Own-Writes**: a SELECT within the transaction sees its own prior
INSERTs;
- **ACID**: the CAS to Catalog happens only at COMMIT. On rollback,
intermediate metadata.json files and the data files already written become
orphans and are reclaimed by the background cleanup queue;
- **SAVEPOINT**: the tracker maintains an internal `level_history` stack,
recording the metadata and file counts before each nested-transaction
modification.
### 5.7 Deletion Queue: why asynchronous cleanup
DROPping an Iceberg table, replacing old files during VACUUM, orphans left
behind by a rolled-back transaction — all of these need deletions against
object storage.
**Why not delete synchronously**: a single Iceberg table can reference tens of
thousands to millions of files. Synchronous deletion inside the transaction
would make DDL block for a long time, and a mid-way failure would leave the
system in a "metadata gone, files stranded" inconsistent state.
**The design**: an `iceberg.pg_iceberg_deletion_queue` system table plus a
background task.
- DROP: just enqueue the metadata_location (`DELETION_TYPE_METADATA`);
- VACUUM: enqueue the paths of old data files that were replaced
(`DELETION_TYPE_FILE`);
- The background task polls the queue, expands the referenced files from
metadata, and deletes them in batches;
- Failed entries get `retry_count++` and are retried later, giving idempotency.
## 6. End-to-End Flows
Execution paths for each key SQL under this design.
### CREATE ICEBERG TABLE
1. PG core performs the CREATE, inserting into `pg_class / pg_attribute /
pg_lake_table`;
2. An `OAT_POST_CREATE` hook on the QD calls the agent's `/iceberg/tables` to
produce the initial metadata.json;
3. The returned metadata_location is written into `iceberg.pg_iceberg_metadata`.
### SELECT
1. The planner calls AM's `scan_get_am_private` and obtains the
metadata_location "that this scan should see" (an already-modified table
triggers one rebase);
2. The QD calls the agent's `/fragments` (with pushdown predicates) and
receives `List<FileScanTask>`;
3. The fragment list is passed through ForeignScan plan; QEs pick up their
share by `segindex`;
4. Each QE calls the Provider to read Parquet, applying the delete index to
skip marked-deleted rows.
### INSERT / UPDATE / DELETE
1. QE calls Volume FDW + Provider to write data files (and, for UPDATE /
DELETE, position-delete files);
2. QE returns file-metadata JSON to the QD;
3. QD calls `tracker.apply_updates_with_rebase`:
- Read latest metadata from Catalog; decide whether rebase is needed;
- Accumulate into the tracker's `data_files / delete_files`;
- Call the agent's `/modify` to generate a new intermediate metadata.json.
4. At COMMIT, `tracker_commit_all` performs the CAS for every modified table.
### VACUUM
1. QD calls the agent's `/plan-rewrite` and receives a rewrite plan (groups
built from min-input-files + target-file-size);
2. QEs each process one group: read old files + write one larger file;
3. QD collects results and calls the agent's `/commit-rewrite` to commit a
RewriteFiles snapshot;
4. The paths of the replaced old files are enqueued into the deletion queue.
### DROP
1. The `OAT_DROP` hook enqueues the metadata_location into the deletion queue;
2. The row in `pg_iceberg_metadata` is removed;
3. The background cleanup task expands all files referenced by the metadata and
deletes them in batches.
## 7. MPP Execution Model
The responsibilities are divided as follows under MPP.
### 7.1 QD vs QE responsibilities
| Responsibility | QD | QE |
|----------------|:--:|:--:|
| Call the agent (create / plan / commit) | ✓ | |
| Metadata Tracker | ✓ | |
| Fragment dispatch | ✓ | |
| Data-file read / write | | ✓ |
| Position-delete read / write | | ✓ |
| Writes to the deletion queue | ✓ | |
**Principle**: only the QD talks to the agent. Letting N QEs hit the agent in
parallel would both make the agent a bottleneck and introduce concurrent writes
to Iceberg snapshot state, which brings its own complexity. The parallel part
is the data I/O.
### 7.2 Fragment dispatch
The QD places `List<FileScanTask>` into the plan tree; it is serialized and
dispatched to QEs. Each QE picks its fragments round-robin by `segindex %
segcount`.
The GUC `datalake.external_table_limit_segment_num` can cap the number of
segments that participate in a scan — useful when joining with small tables to
reduce dispatch overhead.
### 7.3 Global file-id consistency
`UPDATE / DELETE` plans may include a **Redistribute Motion** that ships a row
from QE-i to QE-j. QE-j, when it later dereferences the ctid, must still be
able to resolve it back to its original file.
Under this design, ctids are encoded as `<file_id, row_pos>`. To let any QE
resolve a ctid from any origin, `BeginForeignModify` pre-populates a **global**
file-id map using the **full** fragment list (not just the subset assigned to
the current QE).
## 8. Pushdown & Optimization
WHERE clauses are translated through `deparse.c` into the agent's FilterNode
tree; the agent then converts that into an Iceberg `Expression`, applying
**partition pruning + manifest min/max filtering** at `planFiles` time.
Operators planned for pushdown: `=, !=, >, <, >=, <=, IS [NOT] NULL, LIKE, IN,
AND, OR`.
The Provider C++ layer then applies **row-group filtering + residual predicates
+ column projection**.
A fragment cache (GUC `datalake.enable_iceberg_fragment_cache`, default `on`)
caches `metadata_location + filter` → plan result within a single backend,
avoiding repeated trips to the agent.
## 9. Concurrency with External Engines
Community Iceberg engines (Spark / Trino / …) may write the same table
concurrently. Under this design:
- When an external engine commits, it changes the Catalog's metadata_location;
- The next CB DML's rebase will notice `global != last_base` and replan
(accumulated files are reapplied on top of the new global);
- If replay hits an incompatible evolution (e.g. column-type conflict) → the
agent raises an error → PG aborts the transaction and asks the user to retry.
## 10. Extensibility
**New Catalog type** (Nessie / Glue / in-house):
- Add the corresponding Iceberg `Catalog` construction on the agent side;
- Add a new `type` branch on the PG side.
Because all Iceberg semantics live in the agent, the PG-side change is minimal.
**New storage backend**:
- Add a FileSystem implementation inside libgopher;
- Have the Volume FDW recognize the new `type` and handle its connection
parameters.
**New DML shapes** (MERGE / UPSERT): mostly planner work; the underlying "write
data file + write position-delete" primitives can be reused.
## 11. Outside the First Release (follow-up work)
Items the first release will not cover and that will be discussed in later
iterations:
- Only identity partitioning is planned; bucket / truncate / hour transforms
are not supported;
- No partition-spec evolution;
- No Branch / Tag / Time Travel queries;
- Equality deletes are read-only;
- Concurrency only at Read Committed;
- The agent is single-instance by design; production deployments are expected
to front it with a reverse proxy and multiple instances themselves;
- ANALYZE relies on record_count / bytes returned by the agent and is not
deeply integrated with PG's column statistics;
- When an entire data file is deleted, the first release still writes a
position-delete file and relies on a later VACUUM for cleanup — there is room
for optimization here.
## 12. Appendix
### 12.1 Key GUCs (planned)
| GUC | Default | Description |
|-----|---------|-------------|
| `iceberg_default_catalog` | `''` | default Catalog |
| `iceberg_default_volume` | `''` | default Volume |
| `datalake_agent_server_url` | — | agent endpoint |
| `datalake.enable_iceberg_fragment_cache` | `on` | enable fragment cache |
| `datalake.iceberg_vacuum_compact_min_input_files` | `10` | min input files to
trigger VACUUM compaction |
| `datalake.iceberg_vacuum_rewrite_target_file_size_mb` | `512` | VACUUM target
file size (MB) |
| `datalake.iceberg_postion_deletes_threshold` | `100000` | position-delete
threshold |
| `datalake.external_table_limit_segment_num` | `0` | cap on segments
participating in a scan (0 = no cap) |
| `datalake.disable_filter_pushdown` | `off` | disable predicate pushdown (for
debugging) |
| `datalake.iceberg_autovacuum` | `off` | enable autovacuum (requires restart) |
| `datalake.iceberg_autovacuum_naptime` | `600` | autovacuum interval (seconds)
|
### 12.2 New system tables (planned)
**`iceberg.pg_iceberg_metadata`** — current metadata location for each Iceberg
table
| Column | Type | Description |
|--------|------|-------------|
| `relid` | oid | LakeTable OID (primary key) |
| `metadata_location` | text | current metadata.json path |
| `previous_metadata_location` | text | previous version (used for CAS) |
| `is_internal` | bool | whether this is a Builtin Catalog table |
| `default_spec_id` | int4 | default partition spec |
**`iceberg.pg_iceberg_deletion_queue`** — queue of files to be cleaned up
| Column | Type | Description |
|--------|------|-------------|
| `path` | text | path to delete (primary key) |
| `table_name` | oid | originating table OID |
| `orphaned_at` | timestamptz | time enqueued |
| `retry_count` | int4 | retry count |
| `deletion_type` | int4 | `0 = FILE` / `1 = METADATA` |
### 12.3 Planned code layout
```
contrib/datalake_fdw/
├── src/am_iceberg/ Iceberg Table AM + Metadata Tracker + DDL hook
├── src/iceberg_catalog_fdw/ Catalog FDW (Polaris / Hive / Builtin)
├── src/iceberg_volume_fdw/ Volume FDW (S3 / HDFS)
├── src/provider/iceberg/ Provider C++ (Parquet I/O, delete handling)
├── src/components/agent_cli/ agent gRPC client
└── docs/ this document
contrib/datalake_proxy/ PG bgworker that launches and supervises the
agent jar
contrib/datalake_agent/ Java Spring Boot, wraps iceberg-java
```
---
**Suggested review focus**:
1. Whether the four-layer split (AM / Catalog FDW / Volume FDW / Agent) is
sound;
2. The trade-off of a dedicated Java service for metadata vs. a pure C
implementation;
3. Whether the `datalake_proxy` bgworker process model is the right way to host
the Java agent;
4. The evolution path and compatibility story of the RPC protocol (REST first,
gRPC later);
5. Correctness of the Metadata Tracker's rebase + CAS strategy under Read
Committed and SAVEPOINT;
6. The MPP division of responsibilities: "agent is only talked to by the QD;
data I/O is parallelized on QEs";
7. The necessity of the Builtin Catalog as a metadata fallback;
8. Whether splitting Catalog and Volume into two FDWs is over-abstraction;
9. The extension path for partition evolution / Branch / equality deletes.
### Rollout/Adoption Plan
_No response_
### Are you willing to submit a PR?
- [X] Yes I am willing to submit a PR!
GitHub link: https://github.com/apache/cloudberry/discussions/1683
----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]