Hi Yuxia, First of all thank you for leading this, It's an important aspect as this is non-trivial storage cost in Parquet/ORC files for columns that most consumers never read and schema introspection gets polluted too. I've been going through FIP-27 in detail and have a few questions I'd like to clarify before implementation begins. Grouping them by area:
*1. Schema & Legacy Detection* 1a. When datalake is re-enabled on an existing table and the lake table already exists with system columns, where does the schema inspection happen do we add a new method to the LakeCatalog interface (e.g., getTableSchema(TablePath)), or is this handled at the Fluss server metadata level outside the plugin boundary? 1b. Is the legacy/clean mode decision persisted in Fluss table metadata (e.g., as a property like fluss.lake.schema.mode = legacy | clean), or is it re-derived by inspecting the lake table schema each time? If re-derived, what happens if someone manually alters the lake table schema externally? *2. PARTITION_TIMESTAMP Mode* 2a. The FIP shows a day-granularity example for timestamp-to-partition mapping. Can we document the exact mapping for all supported time-unit values (hour, day, month, quarter, year)? I assume it follows the same DateTimeFormatter patterns in PartitionUtils, but it would be good to make this explicit. 2b. Should the Flink connector fail fast at job submission time (via ValidationException) if PARTITION_TIMESTAMP is used on a non-auto-partitioned table? Or do we allow it for manually partitioned tables as well? 2c. For PK tables with CDC, how are duplicates at the partition boundary resolved during the union read? Is it the same snapshot-then-changelog pattern that FULL mode uses today? The FIP mentions "downstream idempotency" but CDC duplicate handling is non-trivial it would help to be more specific here. *3. Union Read Boundary* 3a. How is the exact transition point from lake historical reads to Fluss log reads determined per-partition is it the per-partition tiering watermark stored in Fluss server metadata? *4. Backward Compatibility* 4a. If a user drops and recreates a table with the same name post-upgrade, the new lake table will not have system columns. Should we warn users about this schema change, especially if they have downstream jobs that depend on __offset or __bucket? *5. Scope* 5a. The changes apply to both Paimon and Iceberg lake catalogs, correct? Both PaimonLakeCatalog and IcebergLakeCatalog currently append system columns independently. Thanks for the FIP, happy to help with the implementation once these are clarified. Best Regards, Mehul Batra On Mon, Mar 2, 2026 at 7:26 PM Lorenzo Affetti via dev <[email protected]> wrote: > Hello! I went through the FIP another time as I did not remember doing it > already :) > > I have additional questions beyond the first 2. > > Let me paste those here and add: > > 1. Isn't the scope of the FIP misleading? > This FIP seems to be about removing system columns, but it primarily > proposes a new read mode named PARTITION_TIMESTAMP. > Is this because removing those columns prevents users from accessing data > on the lake? > If so: > - how do user are supposed to do that now > - What would change > > 2. How does this relate to union reads? > I am quite new to the community and Fluss. Could you explain how the new > PARTITION_TIMESTAMP mode relates to union reads? > If the answer is not obvious, perhaps this warrants a section in the FIP. > > 3. Why *"*Only auto partitioned table is supported in this mode"? > Why only for partitions generated by Fluss, and not for any partition that > represents a timestamp? > > On Wed, Feb 4, 2026 at 4:50 PM Lorenzo Affetti < > [email protected]> wrote: > > > Hello Yuxia! > > Thanks for the great FIP! > > I have some questions: > > > > 1. Isn't the scope of the FIP misleading? > > It seems this FIP is about removing system columns, but it primarily > > proposes a new read mode named PARTITION_TIMESTAMP. > > > > 2. How does this relate to union reads? > > I am quite new to the community and Fluss. Could you explain how the new > > PARTITION_TIMESTAMP mode relates to union reads? > > If the answer is not obvious, perhaps this warrants a section in the FIP. > > > > Thank you! > > > > On Tue, Jan 20, 2026 at 8:20 AM yuxia <[email protected]> > wrote: > > > >> Hi, all. > >> > >> Currently, every Fluss lake table is automatically provisioned with > three > >> mandatory system columns, __bucket , __offset , __timstamp (intended for > >> bucket and offset-based subscription as well as addition informartion > >> check). > >> While originally designed to allow clients to pinpoint specific data > >> offsets of specific buckets, the practical evolution of the ecosystem > has > >> rendered this default behavior suboptimal for the dowstream since the > >> dowstream warehouse or BI tools do not expect these internal metadata > >> fields. > >> > >> > >> So, I'd like to propose FIP-27: Remove Mandatory System Columns From > >> Fluss Lake Tables [1] to remove the three mandatory system columns while > >> still keep compability. > >> > >> Welcome your feedback and suggestions on this proposal. Looking forward > >> to a productive discussion! > >> > >> [1]: > >> > https://cwiki.apache.org/confluence/display/FLUSS/FIP-27%3A+Remove+Mandatory+System+Columns+From+Fluss+Lake+Tables > >> > >> Best regards, > >> Yuxia > >> > > > > > > -- > > Lorenzo Affetti > > Senior Software Engineer @ Flink Team > > Ververica <http://www.ververica.com> > > > > > -- > Lorenzo Affetti > Senior Software Engineer @ Flink Team > Ververica <http://www.ververica.com> >
