jiayuasu opened a new pull request, #859:
URL: https://github.com/apache/sedona-db/pull/859
Continues the Phase P2 work of #791 with pandas-style `sort_values` on the
lazy `DataFrame`.
## API
```python
df.sort_values(by="x")
df.sort_values(by="x", ascending=False)
df.sort_values(by=["x", "y"])
df.sort_values(by=["x", "y"], ascending=[True, False]) # mixed
df.sort_values(by=col("x") + col("y")) # Expr key
df.sort_values(by=[col("x"), "y"]) # mixed str / Expr
```
`by` accepts `str | Expr | list[str | Expr]`. `ascending` is `bool |
list[bool]` (broadcast to `by` length if scalar, must match if list).
## Two semantic calls worth flagging
**Nulls last in both directions.** Pandas's default is `na_position='last'`
regardless of direction; DataFusion's SQL-style default is nulls-first on
descending. We override DataFusion's default so pandas users porting code get
the placement they expect. If the SQL behavior is needed, `sd.sql("... ORDER BY
... NULLS FIRST")` is still the escape hatch.
**No `inplace=` kwarg.** Our `DataFrame` wraps an immutable DataFusion
`LogicalPlan`, so true in-place mutation is impossible. Rather than silently
warn-and-ignore (the design-doc default, which leaves callers with an unsorted
frame they ignored the return value of), the kwarg is simply not defined —
Python raises the standard `TypeError: sort_values() got an unexpected keyword
argument 'inplace'`. A test locks that contract. The design doc will be updated
to match in a follow-up.
## Test plan
17 tests in `tests/expr/test_dataframe_sort_values.py`:
- **Positive**: single key asc/desc, by `col(...)` Expr, by computed Expr
(`col("x") + col("y")`), multi-key all-asc, multi-key mixed asc/desc with
deliberately scrambled input so a broken implementation cannot pass,
ascending-scalar-broadcast, nulls-last asc, nulls-last desc.
- **Lazy return**: `isinstance(out, DataFrame)` (not `hasattr`, per Dewey's
policy).
- **Errors**: empty `by`, length mismatch, bad `by` type, bad `by` element
type (matched on a phrase only that error path uses), bad `ascending` type, bad
`ascending` list element, `inplace=` rejected by Python.
All assertions use `pd.testing.assert_frame_equal` for outputs and
`pytest.raises(..., match="discriminating phrase")` for errors. No
substring-on-a-single-character patterns.
Local: 17 unit + 18 doctests + `ruff format` + `ruff check` all green.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]