beryllw commented on PR #2779: URL: https://github.com/apache/fluss/pull/2779#issuecomment-4072060741
`INSERT INTO t (partial columns) VALUES ...` is parsed into multiple Value Sources and executed as a UNION ALL. Since Flink runtime does not guarantee the left-to-right ordering of UNION ALL inputs, and Fluss assigns auto-increment IDs based on server-side arrival order, we can occasionally observe non-deterministic ordering in test results. 1. When doing a partial-column write with `INSERT INTO t (partial columns) VALUES ...;`, it will hit Flink's optimization rule PreValidateReWriter. 2. PreValidateReWriter.rewriteValues() will pad each row in the VALUES clause into a complete row, filling missing columns with CAST(NULL AS type). https://github.com/apache/flink/blob/f624c8b2ae3035089e46223f4926cfdb50b7bed6/flink-table/flink-table-planner/src/main/scala/org/apache/flink/table/planner/calcite/PreValidateReWriter.scala#L100-L103 3. Since CAST(...) is not a SqlLiteral, SqlToRelConverter.convertRowValues() will degenerate the VALUES clause into row-by-row UNION ALL. https://github.com/apache/flink/blob/f624c8b2ae3035089e46223f4926cfdb50b7bed6/flink-table/flink-table-planner/src/main/java/org/apache/calcite/sql2rel/SqlToRelConverter.java#L1922-L1926 4. The Flink runtime does not guarantee the ordering of UNION ALL inputs, meaning records from different union branches may arrive in a non-deterministic order. https://github.com/apache/flink/blob/f624c8b2ae3035089e46223f4926cfdb50b7bed6/flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/io/StreamMultipleInputProcessor.java#L127-L143 We can refer to this commit to reproduce the issue: https://github.com/beryllw/fluss/commit/001a1b97a7b9f77a0861ee6a96d69fa8fdfeb206 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
