fqaiser94 commented on code in PR #959: URL: https://github.com/apache/iceberg-rust/pull/959#discussion_r2009143697
########## crates/iceberg/src/writer/file_writer/track_writer.rs: ########## @@ -26,16 +26,21 @@ use crate::Result; /// `TrackWriter` is used to track the written size. pub(crate) struct TrackWriter { inner: Box<dyn FileWrite>, - written_size: Arc<AtomicI64>, + written_size: Arc<AtomicU64>, } impl TrackWriter { - pub fn new(writer: Box<dyn FileWrite>, written_size: Arc<AtomicI64>) -> Self { + pub fn new(writer: Box<dyn FileWrite>, written_size: Arc<AtomicU64>) -> Self { Self { inner: writer, written_size, } } + + /// Number of bytes written so far + pub fn bytes_written(&self) -> u64 { + self.written_size.load(std::sync::atomic::Ordering::SeqCst) + } Review Comment: This is the only part I'm concerned about and have been struggling to find time to investigate. What memory ordering option should we use here? In the rest of the codebase, the `written_size` variable is accessed using `Relaxed` memory ordering ... I'm a little concerned about that TBH from a consistency perspective but for the moment, let's focus on just this PR's use case. For Puffin, it's important we get accurate `written_size ` as that is included in the `BlobMetadata` that we write into Puffin file footer for each `Blob` so that users can efficiently extract blobs later. Now that I'm thinking about this again, I think I might have a bug here lol I can't just `SeqCst` for the load side ... I think I need `SeqCst` for the store side too for this to work correctly ... in which case I can just use `Release` for load and `Acquire` for store probably ... What are people's thoughts on this, any memory ordering experts around? XD Other alternatives: - Write a custom tracking writer for the puffin use case and avoid async/atomic variables entirely (like the reference Java implementation) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org