felixscherz commented on PR #650: URL: https://github.com/apache/iceberg-python/pull/650#issuecomment-2094148537
Hi, I finally had some time to continue working on this. Based on your suggestions @geruh I added a `tell` method to the `OutputStream` protocol that returns the number of bytes written to the stream. I then added `__len__` to the `AvroOutputFile` which calls out to either `OutputFile` or `OutputStream` to get the number of bytes written, depending on whether the stream is closed or not. Finally I extended `ManifestWriter` with a `__len__` method that calls `AvroOutputFile`. I initially tried to extend `OutputStream` with `__len__` until I realized that both `FileIO` implementations `fsspec` and `pyarrow` offer `OutputStream` implementations that implement the `tell` method while neither supports `__len__`. If we wanted to go with `__len__` instead of simply using `tell` we might have to implement custom `FsspecOutputStream` and `PyarrowOutputStream` classes that implement `__len__`. This might well be the cleaner approach but introduce a bit more abstraction. What do you think? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org