heiseish opened a new issue, #44160:
URL: https://github.com/apache/arrow/issues/44160
### Describe the bug, including details regarding any error messages,
version, and platform.
## Context
### Description
- When a table built by concat-ing dictionary arrays of mismatched
"schema"/dictionary, the transmitted table appears to be malformed
### Reproducible code
```python
import pyarrow.flight as fl
import pyarrow as pa
import enum
class MyEnum(enum.Enum):
Foo = 0
Bar = 1
Baz = 2
schema = pa.schema({
'col': pa.dictionary(pa.int8(), pa.string())
})
def build_data() -> pa.Table:
non_empty = pa.table({
'col': pa.DictionaryArray.from_arrays(pa.array([0, 2], pa.int8()),
[x.name for x in MyEnum])
}, schema=schema)
empty = pa.table({
'col': pa.DictionaryArray.from_arrays(pa.array([], pa.int8()), [])
}, schema=schema)
# If unify_dictionaries get called here, it works
return pa.concat_tables([empty, non_empty]) # .unify_dictionaries()
class Server(fl.FlightServerBase):
def do_get(self, context, ticket):
table = build_data()
_ = table['col'].to_pylist()
print('build table ', table)
# This doesn't work
return fl.RecordBatchStream(table,
options=pa.ipc.IpcWriteOptions(unify_dictionaries=True))
if __name__ == '__main__':
server = Server()
client = fl.FlightClient(f'grpc://localhost:{server.port}')
client.wait_for_available()
table = client.do_get(fl.Ticket(bytes())).read_all()
try:
_ = table['col'].to_pylist()
print('got table ', table)
except Exception as e:
print(e)
server.shutdown()
```
### Expectation
- `to_pylist` succeeds
### Actual
- `to_pylist` fails with `index with value of 0 is out-of-bounds for array
of length 0`
### Observation
Table before IPC
```
----
col: [ -- dictionary:
[] -- indices:
[], -- dictionary:
["Foo","Bar","Baz"] -- indices:
[0,2]]
```
Table after IPC
```
col: [ -- dictionary:
[] -- indices:
[], -- dictionary:
[] -- indices:
[0,2]]
```
I'm happy to open a PR if someone can point me to the relevant code. Thanks!
### Component(s)
FlightRPC, Python
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]