[I] Feather chunksize doesn't round-trip [arrow]

via GitHub Tue, 04 Feb 2025 12:26:55 -0800


alippai opened a new issue, #45422:
URL: https://github.com/apache/arrow/issues/45422


   ### Describe the usage question you have. Please include as many useful 
details as  possible.
   
   
   I tried this with pyarrow 19:
   ```python
   import pyarrow.feather as pf
   t = ...
   pf.write_feather(t, 'test.feather', chunksize=1024*1024)
   len(pf.read_table('test.feather').to_batches()[0]) // 65k
   pf.write_feather(t, 'test2.feather', chunksize=256*1024)
   len(pf.read_table('test2.feather').to_batches()[0]) // 65k
   ```
   
   I expected the files to be different (different compressed sizes), but they 
are byte-by-byte identical. As a consequence the batch sizes are lost when 
reading the data back. 
   
   Do I assume correctly the file should consist of chunksize long buffers for 
each column (per recordbatch) and these buffers are independently compressed 
using lz4 or zstd?
   
   ### Component(s)
   
   Python, C++, Format


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] Feather chunksize doesn't round-trip [arrow]

Reply via email to