jbewing commented on PR #15150:
URL: https://github.com/apache/iceberg/pull/15150#issuecomment-4035625076

   Sorry for the delay here @RussellSpitzer . I've added some additional test 
coverage at your suggestion. I just solved some merge conflicts with master. 
And I've now gotten a chance to read your approach.
   
   The way I see it, your changeset has followed an extremely similar arc to 
mine. Started with trying to derive iceberg order from spark order. Then, moved 
to using the iceberg table ordering at write time (in combo w/ the various 
other misc options out there like explicitly setting a sort order for 
compactions/rewrites). 
   
   They diverge by about ~60 LoC with the primary difference between them 
being: my changeset is a bit more explicit (e.g. uses stronger types & less 
implicit coupling between iceberg sort order & spark sort). And at this point, 
that's the total sum of the differences.
   
   You're a maintainer so you absolutely get the final call here, but as far as 
I see it they both have their advantages. 
   
   I think my approach is probably slightly friendlier to non-long-term readers 
of the codebase, but it does also come w/ the downside of being slightly less 
performant in terms of serializing a slightly larger object to the executors. 
In practice, this has been running in production for us on larger tables 
(hundreds of TB) that get partially rewritten many many times per day and I've 
never noticed the increased overhead on a flamegraph, so I don't think that's a 
crazy concern. 
   
   But yeah otherwise, I'm happy to change to yours if you think the 60 LoC 
difference / minute difference in approach makes a huge difference in 
readability. Having read both: we both converged on the same place for this 
changeset. They are functionally identical and just two different styles of 
expressing the same concept.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to