Hi folks,
I'm planning to upgrade the Iceberg version in the upcoming 0.9.0 release.
There are several key points we need to discuss before finalizing the
upgrade plan:
💡 Key Discussion Points:
1.
*Should we upgrade to the latest Iceberg version (1.9.x)?*
Iceberg 1.9.x brings new features and improvements, but also introduces
breaking changes (see below).
2.
*Can we drop support for Hadoop 2?*
Iceberg 1.9.x officially removes support for Hadoop 2. Dropping it could
simplify our upgrade path, but may impact legacy users.
3.
*Is it acceptable to allow the mixed format modules (especially Spark)
to use different Iceberg versions?*
This includes options like using Iceberg 1.9.x in core modules while
retaining older versions (e.g., 1.6.x or 1.8.x) in Spark/Flink mixed format
modules to maintain compatibility.
------------------------------
🧭 Proposed Options:*Option 1: Full Upgrade*
-
Upgrade to Iceberg *1.9.x* across the board
-
Drop support for *Hadoop 2* and *Spark ≤ 3.3*
-
Standardize on *Spark 3.4+*
-
Flink using Iceberg 1.4.3
-
✅ Pros: Clean and future-proof.
-
❌ Cons: Breaks compatibility for older environments.
*Option 2: Hybrid Compatibility*
-
Upgrade core to *1.9.x*.
-
For *Hadoop 2* environments, fallback to Iceberg *1.8.x*.
-
For *Spark mixed format*, either:
-
Drop support for Spark ≤ 3.3, *or*
-
Use Iceberg 1.8.x specifically in the Spark mixed format module.
-
✅ Pros: Balances new features with backward compatibility.
-
❌ Cons: More complex build and dependency management.
*Option 3: Conservative Upgrade*
-
Upgrade to Iceberg *1.8.x* as the maximum version.
-
In Flink mixed format (e.g., Flink 1.17), keep using Iceberg *1.6.x*.
-
✅ Pros: Minimal compatibility risk.
-
❌ Cons: Misses improvements in newer Iceberg versions.
*Also consider downgrading to Iceberg 1.6.x when compiling with JDK 8.*