blackmwk commented on PR #2395:
URL: https://github.com/apache/iceberg-rust/pull/2395#issuecomment-4386864350

   > @CTTY @toutane I traced the full call and checked the spec + Java 
implementation, I'm feel strongly that we should keep the current <= approach.
   > 
   > The spec defines last-updated-ms as a wall-clock observation, not a 
logical clock.
   > 
   > From the Table Metadata Fields 
(https://iceberg.apache.org/spec/#table-metadata-fields) section:
   > 
   > │ last-updated-ms Timestamp in milliseconds from the unix epoch when the 
table was last updated. Each table metadata file should update this field just 
before writing.
   > 
   > The spec makes no monotonicity guarantee. Two updates within the same 
millisecond producing equal values is spec-compliant behavior.
   > 
   > The Java implementation has the same behavior.
   > 
   > In TableMetadata.Builder.build() 
(https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/TableMetadata.java),
 the Java code does:
   > 
   > ```java
   > if (lastUpdatedMillis == null) {
   >     this.lastUpdatedMillis = System.currentTimeMillis();
   > }
   > ```
   > 
   > No +1, no `max()`, no monotonicity enforcement. It is identical to what 
the Rust implementation does with `Utc::now().timestamp_millis()`.
   > 
   > The Java project explicitly chose tolerance over strict ordering.
   > 
   > Issue #1109 
([apache/iceberg#1109](https://github.com/apache/iceberg/issues/1109)) 
documents a real-world case where lastUpdatedMillis ended up lower than a 
previous metadata-log entry due to multi-machine clock skew. The fix (PR #1110 
([apache/iceberg#1110](https://github.com/apache/iceberg/pull/1110))) added a 
1-minute tolerance with the comment:
   > 
   > ```java
   > // commits can happen concurrently from different machines.
   > // A tolerance helps us avoid failure for small clock skew
   > lastUpdatedMillis - last.timestampMillis() >= -ONE_MINUTE
   > ```
   > 
   > Equal timestamps pass this validation trivially (difference = 0, which is 
>= -60000).
   > 
   > last-updated-ms is not used for any operational decision.
   > 
   > No TableRequirement variant checks last-updated-ms. Conflict detection 
uses UUID, schema ID, snapshot ref, spec ID, sort order ID, and field IDs. 
Metadata version ordering is determined by the filename (v1.metadata.json → 
v2.metadata.json), not timestamps.
   > 
   > Why not sleep(2ms) or max(now, previous + 1)?
   > 
   > * sleep adds artificial latency to a field that the spec defines as 
informational. It also doesn't solve the problem for real multi-machine 
deployments where clock skew can exceed any fixed sleep.
   > * max(now, previous + 1) silently mutates the timestamp away from 
wall-clock truth. Future callers of the builder would get a value that doesn't 
represent when the update actually happened, with no indication it was 
adjusted. The field's name and spec definition say "when was this last updated" 
— making it lie
   >   about that to satisfy a test assertion inverts the relationship between 
spec and test.
   > 
   > The original < assertion was over-specifying an invariant the system 
doesn't depend on. The <= correctly reflects the spec's semantics: the 
timestamp records when the update happened, and two updates in the same 
millisecond are valid.
   
   Then how about removing the asseration? Since it's not used for any purpose, 
in theory anything can happen due to clock skew.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to