jiayuasu opened a new pull request, #859:
URL: https://github.com/apache/sedona-db/pull/859

   Continues the Phase P2 work of #791 with pandas-style `sort_values` on the 
lazy `DataFrame`.
   
   ## API
   
   ```python
   df.sort_values(by="x")
   df.sort_values(by="x", ascending=False)
   df.sort_values(by=["x", "y"])
   df.sort_values(by=["x", "y"], ascending=[True, False])   # mixed
   df.sort_values(by=col("x") + col("y"))                    # Expr key
   df.sort_values(by=[col("x"), "y"])                        # mixed str / Expr
   ```
   
   `by` accepts `str | Expr | list[str | Expr]`. `ascending` is `bool | 
list[bool]` (broadcast to `by` length if scalar, must match if list).
   
   ## Two semantic calls worth flagging
   
   **Nulls last in both directions.** Pandas's default is `na_position='last'` 
regardless of direction; DataFusion's SQL-style default is nulls-first on 
descending. We override DataFusion's default so pandas users porting code get 
the placement they expect. If the SQL behavior is needed, `sd.sql("... ORDER BY 
... NULLS FIRST")` is still the escape hatch.
   
   **No `inplace=` kwarg.** Our `DataFrame` wraps an immutable DataFusion 
`LogicalPlan`, so true in-place mutation is impossible. Rather than silently 
warn-and-ignore (the design-doc default, which leaves callers with an unsorted 
frame they ignored the return value of), the kwarg is simply not defined — 
Python raises the standard `TypeError: sort_values() got an unexpected keyword 
argument 'inplace'`. A test locks that contract. The design doc will be updated 
to match in a follow-up.
   
   ## Test plan
   
   17 tests in `tests/expr/test_dataframe_sort_values.py`:
   
   - **Positive**: single key asc/desc, by `col(...)` Expr, by computed Expr 
(`col("x") + col("y")`), multi-key all-asc, multi-key mixed asc/desc with 
deliberately scrambled input so a broken implementation cannot pass, 
ascending-scalar-broadcast, nulls-last asc, nulls-last desc.
   - **Lazy return**: `isinstance(out, DataFrame)` (not `hasattr`, per Dewey's 
policy).
   - **Errors**: empty `by`, length mismatch, bad `by` type, bad `by` element 
type (matched on a phrase only that error path uses), bad `ascending` type, bad 
`ascending` list element, `inplace=` rejected by Python.
   
   All assertions use `pd.testing.assert_frame_equal` for outputs and 
`pytest.raises(..., match="discriminating phrase")` for errors. No 
substring-on-a-single-character patterns.
   
   Local: 17 unit + 18 doctests + `ruff format` + `ruff check` all green.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to