dalaro opened a new issue, #14719:
URL: https://github.com/apache/iceberg/issues/14719

   ### Apache Iceberg version
   
   1.10.0 (latest release)
   
   ### Query engine
   
   None
   
   ### Please describe the bug 🐞
   
   This is either a minor spec issue, in the sense that it should not require 
changes by any implementations, or a misunderstanding on my part.
   
   The spec 
[describes](https://github.com/apache/iceberg/blob/52c176df681cd6381ccf4d7f58ef1815ef5db19e/format/spec.md#assignment-of-snapshot-ids-and-current-snapshot-id1)
 how the Java implementation generates snapshot IDs:
   
   > The reference Java implementation uses a type 4 uuid and XORs the 4 most 
significant bytes with the 4 least significant bytes then ANDs with the maximum 
long value to arrive at a pseudo-random snapshot id with a low probability of 
collision.
   
   This looks odd.  Just 4 bytes and not 8?  Here's 
[SnapshotIdGeneratorUtil](https://github.com/apache/iceberg/blob/52c176df681cd6381ccf4d7f58ef1815ef5db19e/core/src/main/java/org/apache/iceberg/SnapshotIdGeneratorUtil.java):
   
   ```
     /**
      * Create a new ID for a Snapshot
      *
      * @return a long snapshot ID
      */
     public static long generateSnapshotID() {
       UUID uuid = UUID.randomUUID();
       long mostSignificantBits = uuid.getMostSignificantBits();
       long leastSignificantBits = uuid.getLeastSignificantBits();
       return (mostSignificantBits ^ leastSignificantBits) & Long.MAX_VALUE;
     }
   ```
   
   If that's not a red herring, then I think the reference implementation XORs 
the upper and lower 8-byte halves of the UUID, which also makes more intuitive 
sense (fits a `long`).  In that case, maybe the paragraph should mention the "8 
most/least significant bytes" instead of 4.
   
   I could prepare a tiny PR against 
https://github.com/apache/iceberg/blob/master/format/spec.md, but barging in 
with a spec change (however trivial) in my first-ever interaction with the 
project seemed like an act of hubris, so I thought it better to get a sanity 
check first.  Maybe I'm just reading this wrong.
   
   ### Willingness to contribute
   
   - [x] I can contribute a fix for this bug independently
   - [ ] I would be willing to contribute a fix for this bug with guidance from 
the Iceberg community
   - [ ] I cannot contribute a fix for this bug at this time


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to