[I] [Python][Types] Type stub improvements for better coverage with Arrow IPC and compute operations [arrow]

via GitHub Fri, 02 Jan 2026 07:27:33 -0800


rustyconover opened a new issue, #48711:
URL: https://github.com/apache/arrow/issues/48711


   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   While integrating `pyarrow-stubs` into a project using Arrow IPC streaming, 
I encountered several type annotation gaps that required workarounds (# type: 
ignore comments or cast() calls). This issue documents these gaps to help 
improve stub coverage.
   
   Environment:
   - `pyarrow-stubs` version: 17.11
   - `pyarrow` version: 19.0.1
   - `mypy` version: 1.14.1
   - Python version: 3.12
   
   Issues Found
   
   1. `pa.PythonFile` constructor doesn't accept standard file-like objects
   
   Problem: The `PythonFile` constructor signature is too restrictive. It 
doesn't accept `IO[bytes]` or `BufferedIOBase` objects without explicit casting.
   
   Workaround required:
   ```python
   import io
   from typing import cast
   import pyarrow as pa
   
   # This requires a cast:
   stdin_sink = pa.PythonFile(cast(io.IOBase, proc.stdin))
   
   # Similarly for stdout:
   pa.PythonFile(cast(io.IOBase, sys.stdout.buffer), mode="w")
   ```
   
   Expected: `PythonFile.__init__` should accept `IO[bytes]`, `BufferedIOBase`, 
or a `typing.BinaryIO` union.
   
   ---
   
   2. `pa.BufferReader` incompatible with `pa.ipc.read_schema()`
   
   Problem: When passing a `BufferReader` to `ipc.read_schema()`, mypy reports 
an argument type error.
   
   Workaround required:
   
   ```python
   output_schema_bytes: bytes = ...
   output_schema = pa.ipc.read_schema(pa.BufferReader(output_schema_bytes))  # 
type: ignore[arg-type]
   ```
   
   Expected: `ipc.read_schema()` should accept `BufferReader` (or its parent 
`NativeFile`) as a valid input type.
   
   ---
   
   3. `pa.schema()` field list typing is overly restrictive
   
   Problem: Creating a schema from a list of tuples `[("name", pa.string())]` 
or `pa.Field` objects causes type errors.
   
   Workaround required:
   
   ```python
   from typing import Any
   
   def make_schema(fields: list[Any]) -> pa.Schema:
       """Helper to avoid mypy errors with field lists."""
       return pa.schema(fields)
   
   # Usage:
   schema = make_schema([("x", pa.int64()), ("y", pa.string())])
   schema = make_schema([pa.field("x", pa.int64())])
   ```
   
   Expected: `pa.schema()` should accept:
   
   - `list[tuple[str, DataType]]`
   - `list[Field]`
   - `Iterable[tuple[str, DataType] | Field]`
   
   ---
   
   4. `pyarrow.compute.filter()` missing `RecordBatch` overload
   
   Problem: `pc.filter()` works with `RecordBatch` at runtime but the stubs 
only define overloads for `Array` and `ChunkedArray`.
   
   Workaround required:
   
   ```python
   import pyarrow.compute as pc
   
   batch: pa.RecordBatch = ...
   mask: pa.BooleanArray = ...
   result = pc.filter(batch, mask)  # type: ignore[call-overload]
   ```
   
   Expected: Add overload for `RecordBatch`:
   
   ```python
   @overload
   def filter(
       values: RecordBatch,
       selection_filter: Array | ChunkedArray,
       /,
       null_selection_behavior: Literal["drop", "emit_null"] = ...,
   ) -> RecordBatch: ...
   ```
   
   ---
   
   5. `pa.Scalar` generic requires TYPE_CHECKING import pattern
   
   Problem: Using `pa.Scalar[T]` as a type annotation at runtime raises errors 
because `Scalar` isn't subscriptable at runtime in older patterns.
   
   Current pattern required:
   
   ```python
   from typing import TYPE_CHECKING, Any
   
   if TYPE_CHECKING:
       from pyarrow import Scalar
   
   # Then use as:
   positional: tuple[Scalar[Any] | None, ...] = ()
   named: dict[str, Scalar[Any]] = {}
   ```
   
   This is a minor issue but worth noting for documentation.
   
   
   ### Component(s)
   
   Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] [Python][Types] Type stub improvements for better coverage with Arrow IPC and compute operations [arrow]

Reply via email to