WillAyd opened a new issue, #2504:
URL: https://github.com/apache/arrow-adbc/issues/2504
### What happened?
When trying to use first/last aggregations with TableGroupBy, PyArrow throws
`ArrowNotImplementedError`:
```python
In [48]: import pyarrow as pa
In [49]: tbl = pa.Table.from_pydict({"key": range(3), "val": ["foo", "bar",
"baz"]})
In [50]: pa.TableGroupBy(tbl1, "key").aggregate([("val", "first")])
ArrowNotImplementedError Traceback (most recent call last)
Cell In[50], line 1
----> 1 pa.TableGroupBy(tbl, "key").aggregate([("val", "first")])
File
~/miniforge3/envs/scratchpad/lib/python3.13/site-packages/pyarrow/table.pxi:6560,
in pyarrow.lib.TableGroupBy.aggregate()
File
~/miniforge3/envs/scratchpad/lib/python3.13/site-packages/pyarrow/acero.py:410,
in _group_by(table, aggregates, keys, use_threads)
404 def _group_by(table, aggregates, keys, use_threads=True):
406 decl = Declaration.from_sequence([
407 Declaration("table_source", TableSourceNodeOptions(table)),
408 Declaration("aggregate", AggregateNodeOptions(aggregates,
keys=keys))
409 ])
--> 410 return decl.to_table(use_threads=use_threads)
File
~/miniforge3/envs/scratchpad/lib/python3.13/site-packages/pyarrow/_acero.pyx:590,
in pyarrow._acero.Declaration.to_table()
File
~/miniforge3/envs/scratchpad/lib/python3.13/site-packages/pyarrow/error.pxi:155,
in pyarrow.lib.pyarrow_internal_check_status()
File
~/miniforge3/envs/scratchpad/lib/python3.13/site-packages/pyarrow/error.pxi:92,
in pyarrow.lib.check_status()
ArrowNotImplementedError: Using ordered aggregator in multiple threaded
execution is not supported
````
Interestingly, the first_last aggregation returns without error:
```python
In [52]: pa.TableGroupBy(tbl, "key").aggregate([("val", "first_last")])
Out[52]:
pyarrow.Table
key: int64
val_first_last: struct<first: string, last: string>
child 0, first: string
child 1, last: string
----
key: [[0,1,2]]
val_first_last: [
-- is_valid: all not null
-- child 0 type: string
["foo","bar","baz"]
-- child 1 type: string
["foo","bar","baz"]]
```
### Stack Trace
```
----> 1 pa.TableGroupBy(tbl, "key").aggregate([("val", "first")])
File
~/miniforge3/envs/scratchpad/lib/python3.13/site-packages/pyarrow/table.pxi:6560,
in pyarrow.lib.TableGroupBy.aggregate()
File
~/miniforge3/envs/scratchpad/lib/python3.13/site-packages/pyarrow/acero.py:410,
in _group_by(table, aggregates, keys, use_threads)
404 def _group_by(table, aggregates, keys, use_threads=True):
406 decl = Declaration.from_sequence([
407 Declaration("table_source", TableSourceNodeOptions(table)),
408 Declaration("aggregate", AggregateNodeOptions(aggregates,
keys=keys))
409 ])
--> 410 return decl.to_table(use_threads=use_threads)
File
~/miniforge3/envs/scratchpad/lib/python3.13/site-packages/pyarrow/_acero.pyx:590,
in pyarrow._acero.Declaration.to_table()
File
~/miniforge3/envs/scratchpad/lib/python3.13/site-packages/pyarrow/error.pxi:155,
in pyarrow.lib.pyarrow_internal_check_status()
File
~/miniforge3/envs/scratchpad/lib/python3.13/site-packages/pyarrow/error.pxi:92,
in pyarrow.lib.check_status()
ArrowNotImplementedError: Using ordered aggregator in multiple threaded
execution is not supported
```
### How can we reproduce the bug?
See code sampel above
### Environment/Setup
PyArrow 19.0.0
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]