fenfeng9 opened a new issue, #50113:
URL: https://github.com/apache/arrow/issues/50113
### Describe the bug, including details regarding any error messages,
version, and platform.
Sliced union arrays can report incorrect logical null counts in `count`.
#### Expected behavior:
For sliced sparse and dense unions, `count(..., mode="only_valid")` and
`count(..., mode="only_null")` should reflect the logical nullness of the slice.
#### Actual behavior:
For both sliced sparse and dense unions, `count(..., mode="only_valid")` and
`count(..., mode="only_null")` return incorrect results.
### Reproduce
```python
import pyarrow as pa
import pyarrow.compute as pc
def main():
# sparse_arr == [0.5, None, None, 3.0, True].
sparse_arr = pa.UnionArray.from_sparse(
pa.array([0, 1, 0, 0, 1], type=pa.int8()),
[
pa.array([0.5, 99.0, None, 3.0, 88.0]),
pa.array([False, None, True, False, True]),
],
)
# sparse == [None, None, 3.0, True].
sparse = sparse_arr.slice(1, 4)
# dense_arr == [0.5, True, 1.5, None, None, False].
dense_arr = pa.UnionArray.from_dense(
pa.array([0, 1, 0, 0, 1, 1], type=pa.int8()),
pa.array([0, 0, 1, 2, 1, 2], type=pa.int32()),
[
pa.array([0.5, 1.5, None]),
pa.array([True, None, False]),
],
)
# dense == [True, 1.5, None, None].
dense = dense_arr.slice(1, 4)
print(f"pyarrow: {pa.__version__}")
print()
# Logical sparse slice: [None, None, 3.0, True].
print("sparse count only_valid expected: 2")
print(f"sparse count only_valid actual: {pc.count(sparse,
mode='only_valid').as_py()}")
print("sparse count only_null expected: 2")
print(f"sparse count only_null actual: {pc.count(sparse,
mode='only_null').as_py()}")
print()
# Logical dense slice: [True, 1.5, None, None].
print("dense count only_valid expected: 2")
print(f"dense count only_valid actual: {pc.count(dense,
mode='only_valid').as_py()}")
print("dense count only_null expected: 2")
print(f"dense count only_null actual: {pc.count(dense,
mode='only_null').as_py()}")
if __name__ == "__main__":
main()
```
### Result
```python
pyarrow: 24.0.0
sparse count only_valid expected: 2
sparse count only_valid actual: 4
sparse count only_null expected: 2
sparse count only_null actual: 0
dense count only_valid expected: 2
dense count only_valid actual: 4
dense count only_null expected: 2
dense count only_null actual: 0
```
### Component(s)
C++, Python
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]