rustyconover opened a new issue, #48711:
URL: https://github.com/apache/arrow/issues/48711
### Describe the bug, including details regarding any error messages,
version, and platform.
While integrating `pyarrow-stubs` into a project using Arrow IPC streaming,
I encountered several type annotation gaps that required workarounds (# type:
ignore comments or cast() calls). This issue documents these gaps to help
improve stub coverage.
Environment:
- `pyarrow-stubs` version: 17.11
- `pyarrow` version: 19.0.1
- `mypy` version: 1.14.1
- Python version: 3.12
Issues Found
1. `pa.PythonFile` constructor doesn't accept standard file-like objects
Problem: The `PythonFile` constructor signature is too restrictive. It
doesn't accept `IO[bytes]` or `BufferedIOBase` objects without explicit casting.
Workaround required:
```python
import io
from typing import cast
import pyarrow as pa
# This requires a cast:
stdin_sink = pa.PythonFile(cast(io.IOBase, proc.stdin))
# Similarly for stdout:
pa.PythonFile(cast(io.IOBase, sys.stdout.buffer), mode="w")
```
Expected: `PythonFile.__init__` should accept `IO[bytes]`, `BufferedIOBase`,
or a `typing.BinaryIO` union.
---
2. `pa.BufferReader` incompatible with `pa.ipc.read_schema()`
Problem: When passing a `BufferReader` to `ipc.read_schema()`, mypy reports
an argument type error.
Workaround required:
```python
output_schema_bytes: bytes = ...
output_schema = pa.ipc.read_schema(pa.BufferReader(output_schema_bytes)) #
type: ignore[arg-type]
```
Expected: `ipc.read_schema()` should accept `BufferReader` (or its parent
`NativeFile`) as a valid input type.
---
3. `pa.schema()` field list typing is overly restrictive
Problem: Creating a schema from a list of tuples `[("name", pa.string())]`
or `pa.Field` objects causes type errors.
Workaround required:
```python
from typing import Any
def make_schema(fields: list[Any]) -> pa.Schema:
"""Helper to avoid mypy errors with field lists."""
return pa.schema(fields)
# Usage:
schema = make_schema([("x", pa.int64()), ("y", pa.string())])
schema = make_schema([pa.field("x", pa.int64())])
```
Expected: `pa.schema()` should accept:
- `list[tuple[str, DataType]]`
- `list[Field]`
- `Iterable[tuple[str, DataType] | Field]`
---
4. `pyarrow.compute.filter()` missing `RecordBatch` overload
Problem: `pc.filter()` works with `RecordBatch` at runtime but the stubs
only define overloads for `Array` and `ChunkedArray`.
Workaround required:
```python
import pyarrow.compute as pc
batch: pa.RecordBatch = ...
mask: pa.BooleanArray = ...
result = pc.filter(batch, mask) # type: ignore[call-overload]
```
Expected: Add overload for `RecordBatch`:
```python
@overload
def filter(
values: RecordBatch,
selection_filter: Array | ChunkedArray,
/,
null_selection_behavior: Literal["drop", "emit_null"] = ...,
) -> RecordBatch: ...
```
---
5. `pa.Scalar` generic requires TYPE_CHECKING import pattern
Problem: Using `pa.Scalar[T]` as a type annotation at runtime raises errors
because `Scalar` isn't subscriptable at runtime in older patterns.
Current pattern required:
```python
from typing import TYPE_CHECKING, Any
if TYPE_CHECKING:
from pyarrow import Scalar
# Then use as:
positional: tuple[Scalar[Any] | None, ...] = ()
named: dict[str, Scalar[Any]] = {}
```
This is a minor issue but worth noting for documentation.
### Component(s)
Python
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]