blaginin opened a new issue, #47911: URL: https://github.com/apache/arrow/issues/47911
### Describe the bug, including details regarding any error messages, version, and platform. <img width="3072" height="1664" alt="Image" src="https://github.com/user-attachments/assets/ac71a39d-a1f1-4242-9c6b-e795b43989ee" /> 1. I construct an arrow array from big native python list of dicts, that forces arrow to create a chunked array 2. Array is chunked in such a way that some colum(s) are only present in one chunk 3. That forces some method not to work at all (i.e. `table.to_pandas`) while others (`table.to_reader`) produce incorrect data (they say columns are `None` even when the value is actually set) ```python import pyarrow as pa pa.show_versions() data = [ {"key_1": "abcd" * 100_000, "pk": i} # large enough rows to force chunking. replace 100_000 with 10 and everything below will pass for i in range(10_000) ] data.append({"key_1": "abcd", "pk": 10_000, "extra_key": "value"}) t = pa.array(data) print(t.to_pandas().iloc[-1]) table = pa.Table.from_struct_array(t) # print(table.to_pandas().iloc[-1]) # ← this fails with: # ValueError: Shape of passed values is (10002, 3), indices imply (10001, 3) r = table.to_reader() vals = set() while True: try: batch = r.read_next_batch() vals.update(batch.column("extra_key").to_pylist()) except StopIteration: break print("\nUnique values in 'extra_key':", vals) # assert vals == {"value", None} # this fails because vals is { None } ``` ``` Package kind : python-wheel-macos Arrow C++ library version : 21.0.0 Arrow C++ compiler : AppleClang 15.0.0.15000309 Arrow C++ compiler flags : -fno-aligned-new -Qunused-arguments -fcolor-diagnostics -Wall -Wno-unknown-warning-option -Wno-pass-failed Arrow C++ git revision : ee4d09ebef61c663c1efbfa4c18e518a03b798be Arrow C++ git description : apache-arrow-21.0.0-rc6 Arrow C++ build type : release ``` ### Component(s) Python -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
