blaginin opened a new issue, #47911:
URL: https://github.com/apache/arrow/issues/47911

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   <img width="3072" height="1664" alt="Image" 
src="https://github.com/user-attachments/assets/ac71a39d-a1f1-4242-9c6b-e795b43989ee";
 />
   
   
   1. I construct an arrow array from big native python list of dicts, that 
forces arrow to create a chunked array
   2. Array is chunked in such a way that some colum(s) are only present in one 
chunk
   3. That forces some method not to work at all (i.e. `table.to_pandas`) while 
others (`table.to_reader`) produce incorrect data (they say columns are `None` 
even when the value is actually set)
   
   ```python
   import pyarrow as pa
   
   pa.show_versions()
   
   data = [
       {"key_1": "abcd" * 100_000, "pk": i}  # large enough rows to force 
chunking. replace 100_000 with 10 and everything below will pass
       for i in range(10_000)
   ]
   
   data.append({"key_1": "abcd", "pk": 10_000, "extra_key": "value"})
   
   t = pa.array(data)
   print(t.to_pandas().iloc[-1])
   
   table = pa.Table.from_struct_array(t)
   
   # print(table.to_pandas().iloc[-1])  # ← this fails with:
   # ValueError: Shape of passed values is (10002, 3), indices imply (10001, 3)
   
   
   r = table.to_reader()
   
   vals = set()
   
   while True:
       try:
           batch = r.read_next_batch()
           vals.update(batch.column("extra_key").to_pylist())
   
       except StopIteration:
           break
   
   print("\nUnique values in 'extra_key':", vals)
   # assert vals == {"value", None}  # this fails because vals is { None }
   ```
   
   
   ```
   Package kind              : python-wheel-macos
   Arrow C++ library version : 21.0.0  
   Arrow C++ compiler        : AppleClang 15.0.0.15000309
   Arrow C++ compiler flags  :  -fno-aligned-new  -Qunused-arguments 
-fcolor-diagnostics  -Wall -Wno-unknown-warning-option -Wno-pass-failed 
   Arrow C++ git revision    : ee4d09ebef61c663c1efbfa4c18e518a03b798be
   Arrow C++ git description : apache-arrow-21.0.0-rc6
   Arrow C++ build type      : release 
   ```
   
   ### Component(s)
   
   Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to