fiktor opened a new issue, #41388: URL: https://github.com/apache/arrow/issues/41388
### Describe the bug, including details regarding any error messages, version, and platform. ## Example 1 ``` import numpy as np import pyarrow as pa np_array = np.array([b'\x00Very important data. Keep it safe!', b'Less important data. Keep it safe??']) table = pa.Table.from_pydict({'sample': np_array}) print(table['sample'][0].as_py()) ``` Expected output `b'\x00Very important data. Keep it safe!'`, actual output `b''`. Note: `type(table['sample'])` in this example is `pyarrow.lib.ChunkedArray`. `table['sample'][1]` is `b'Less important data. Keep it safe??'` as expected. The byte arrays in this example look like human-readable strings (except for the first null byte), but in reality we are trying to encode binary data as a fixed length byte array and store it in (variable length for compatibility with HuggingFace) byte array parquet column. ## Example 2 With the same `np_array`: ``` pa_array = pa.array(np_array, type=pa.binary()) print(pa_array[0]) ``` Expected output `b'\x00Very important data. Keep it safe!'`, actual output `b''`. Note: `type(pa_array)` is `pyarrow.lib.BinaryArray`. ## Possible workaround Replacing `table = pa.Table.from_pydict({'sample': np_array})` with `table = pa.Table.from_pydict({'sample': a.array(array, type=pa.binary(array.itemsize)).cast(target_type=pa.binary())})` seems to produce the intended result. Note that in our usecase we want the resulting dtype to be `pa.binary()` (BYTE_ARRAY in parquet), not `pa.binary(35) (FIXED_LEN_BYTE_ARRAY in parquet), because the latter does not seem to be supported by HuggingFace datasets library. ## System info The above outputs are on Ubuntu 22.04.4 LTS with Python 3.10.11, pyarrow version 12.0.1 (using parquet-cpp-arrow version 12.0.1). ## Related issues This seems to be similar but distinct from #36308 , where the input contained byte arrays of different lengths and a conversion to `pa.string()` column was desired. ### Component(s) Python -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org