fqaiser94 commented on code in PR #959:
URL: https://github.com/apache/iceberg-rust/pull/959#discussion_r2009143697


##########
crates/iceberg/src/writer/file_writer/track_writer.rs:
##########
@@ -26,16 +26,21 @@ use crate::Result;
 /// `TrackWriter` is used to track the written size.
 pub(crate) struct TrackWriter {
     inner: Box<dyn FileWrite>,
-    written_size: Arc<AtomicI64>,
+    written_size: Arc<AtomicU64>,
 }
 
 impl TrackWriter {
-    pub fn new(writer: Box<dyn FileWrite>, written_size: Arc<AtomicI64>) -> 
Self {
+    pub fn new(writer: Box<dyn FileWrite>, written_size: Arc<AtomicU64>) -> 
Self {
         Self {
             inner: writer,
             written_size,
         }
     }
+
+    /// Number of bytes written so far
+    pub fn bytes_written(&self) -> u64 {
+        self.written_size.load(std::sync::atomic::Ordering::SeqCst)
+    }

Review Comment:
   This is the only part I'm concerned about and have been struggling to find 
time to investigate. What memory ordering option should we use here? 
   
   In the rest of the codebase, the `written_size`  variable is accessed using 
`Relaxed` memory ordering ... I'm a little concerned about that TBH  from a 
consistency perspective but for the moment, let's focus on just this PR's use 
case. For Puffin, it's important we get accurate `written_size ` as that is 
included in the `BlobMetadata` that we write into Puffin file footer for each 
`Blob` so that users can efficiently extract blobs later.  
   
   Now that I'm thinking about this again, I think I might have a bug here lol 
I can't just `SeqCst` for the load side ... I think I need `SeqCst` for the 
store side too for this to work correctly ... in which case I can just use 
`Release` for load and `Acquire` for store probably ... What are people's 
thoughts on this, any memory ordering experts around? XD
   
   Other alternatives: 
   - Write a custom tracking writer for the puffin use case and avoid 
async/atomic variables entirely (like the reference Java implementation)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to