RussellSpitzer commented on issue #14719:
URL: https://github.com/apache/iceberg/issues/14719#issuecomment-3592704824

   Great to see you contributing here! I think you are spot on though and the
   spec was probably originally detailing an integer implementation and when
   the type got widened, no one changed it. That’s at least my guess. Since
   this is an implementation note I don’t think we need a vote or even ml
   notice to get this in.
   
   On Sun, Nov 30, 2025 at 1:36 AM Dan LaRocque ***@***.***>
   wrote:
   
   > *dalaro* created an issue (apache/iceberg#14719)
   > <https://github.com/apache/iceberg/issues/14719>
   > Apache Iceberg version
   >
   > 1.10.0 (latest release)
   > Query engine
   >
   > None
   > Please describe the bug 🐞
   >
   > This is either a minor spec issue, in the sense that it should not require
   > changes by any implementations, or a misunderstanding on my part.
   >
   > The spec describes
   > 
<https://github.com/apache/iceberg/blob/52c176df681cd6381ccf4d7f58ef1815ef5db19e/format/spec.md#assignment-of-snapshot-ids-and-current-snapshot-id1>
   > how the Java implementation generates snapshot IDs:
   >
   > The reference Java implementation uses a type 4 uuid and XORs the 4 most
   > significant bytes with the 4 least significant bytes then ANDs with the
   > maximum long value to arrive at a pseudo-random snapshot id with a low
   > probability of collision.
   >
   > This looks odd. Just 4 bytes and not 8? Here's SnapshotIdGeneratorUtil
   > 
<https://github.com/apache/iceberg/blob/52c176df681cd6381ccf4d7f58ef1815ef5db19e/core/src/main/java/org/apache/iceberg/SnapshotIdGeneratorUtil.java>
   > :
   >
   >   /**
   >    * Create a new ID for a Snapshot
   >    *
   >    * @return a long snapshot ID
   >    */
   >   public static long generateSnapshotID() {
   >     UUID uuid = UUID.randomUUID();
   >     long mostSignificantBits = uuid.getMostSignificantBits();
   >     long leastSignificantBits = uuid.getLeastSignificantBits();
   >     return (mostSignificantBits ^ leastSignificantBits) & Long.MAX_VALUE;
   >   }
   >
   > If that's not a red herring, then I think the reference implementation
   > XORs the upper and lower 8-byte halves of the UUID, which also makes more
   > intuitive sense (fits a long). In that case, maybe the paragraph should
   > mention the "8 most/least significant bytes" instead of 4.
   >
   > I could prepare a tiny PR against
   > https://github.com/apache/iceberg/blob/master/format/spec.md, but barging
   > in with a spec change (however trivial) in my first-ever interaction with
   > the project seemed like an act of hubris, so I thought it better to get a
   > sanity check first. Maybe I'm just reading this wrong.
   > Willingness to contribute
   >
   >    - I can contribute a fix for this bug independently
   >    - I would be willing to contribute a fix for this bug with guidance
   >    from the Iceberg community
   >    - I cannot contribute a fix for this bug at this time
   >
   > —
   > Reply to this email directly, view it on GitHub
   > <https://github.com/apache/iceberg/issues/14719>, or unsubscribe
   > 
<https://github.com/notifications/unsubscribe-auth/AADE2YM725EUOGKQLPMMSIL37KM73AVCNFSM6AAAAACNSRPEYSVHI2DSMVQWIX3LMV43ASLTON2WKOZTGY3TONBYGYYDGNQ>
   > .
   > You are receiving this because you are subscribed to this thread.Message
   > ID: ***@***.***>
   >
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to