jiayuasu opened a new pull request, #852:
URL: https://github.com/apache/sedona-db/pull/852
Post-merge follow-up to #846 addressing @paleolimbot's review feedback.
## What changes
**`DataFrame.__getitem__` is now strictly single-column lookup:**
```python
df["x"] # → Expr referencing column x
df[0] # → Expr referencing the first column
df[-1] # → Expr referencing the last column
```
Previously the same method also handled `df[["x","y"]]` (list projection)
and `df[bool_expr]` (filter). Those are dropped. Users go through the explicit
`df.select(...)` and `df.filter(...)` entry points instead.
The motivation, mirroring Ibis's same deprecation: a single return type
(`Expr`) lets IDEs and type checkers resolve `df["x"].<method>` cleanly. With
the old polymorphic shape, the return type was `DataFrame | Expr`, which broke
autocomplete on the common case.
## Error paths
| Key | Behavior |
|---|---|
| Unknown column name | `KeyError`, message lists available columns |
| Out-of-range int | `IndexError` |
| `bool` | `TypeError` (guarded explicitly; bool is a subclass of int in
Python) |
| `list`, `slice`, `Expr`, anything else | `TypeError` with a message
pointing at `select` / `filter` |
## While here — import-discipline cleanup
`__getitem__`, `select`, and `filter` previously had lazy in-function
imports of `Expr`, `Literal`, `col`, and `_to_expr`. Per the policy in this
module ("lazy imports are reserved for optional dependencies like pyarrow"),
those move to module level (combined) and the method annotations switch from
string forward references to the runtime-imported names.
## Test plan
- 11 tests in `tests/expr/test_dataframe_getitem.py` cover name / positive /
negative / first / out-of-range / unknown / bool / list / slice / Expr keys
plus the operator-composition path on `df["x"]`.
- All exact `repr() == ...` for the positive cases, per the test-policy
convention locked in earlier PRs.
- Existing `test_dataframe_select.py` and `test_dataframe_filter.py` still
pass (35 expr-dataframe tests total).
- `pytest --doctest-modules dataframe.py` passes (17 doctests including the
updated `__getitem__` example).
- `ruff check` and `ruff format` clean.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]