willtemperley opened a new issue, #47824:
URL: https://github.com/apache/arrow/issues/47824

   ### Describe the enhancement requested
   
   This is a follow-up to a discussion started in 
https://github.com/apache/arrow-swift/pull/95 where we were fixing some issues 
in `ArrowWriter`.
   
   @kou was asking how the fixes lined up with the spec, and that was a 
difficult question to answer. Looking at the   
[encapsulated-message-format](https://arrow.apache.org/docs/format/Columnar.html#encapsulated-message-format):
   
   "The metadata_size includes the size of the Message plus padding. The 
metadata_flatbuffer contains a serialized Message Flatbuffer value"
   
   It seems reasonable to assume that the `Message` probably refers to a 
FlatBuffers `Message` therefore metadata_size should be exactly the length of 
this message plus padding. However, according to PyArrow this includes the 
`Message` size plus 8 bytes.
   
   Doing a small experiment with the testFileWriter_bool example in 
arrow-swift, writing the block metadata without the 8 byte prefix in the 
metadataLength:
   
   offset: 120
   metadataLength: 208
   bodyLength: 296
   
   PyArrow won't open the file, throwing an error:
   pyarrow.lib.ArrowInvalid: flatbuffer size 8 invalid. File offset: 128, 
metadata length: 208
   (Note that the `Message` size is confirmed to be 208 bytes).
   
   However if the metadataLength includes the 8 byte prefix, i.e.:
   
   offset: 120
   metadataLength: 216
   bodyLength: 296
   
   The file is valid according to PyArrow. 
   
   ### Component(s)
   
   Documentation, Python, Swift


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to