zhuqi-lucas commented on code in PR #21956:
URL: https://github.com/apache/datafusion/pull/21956#discussion_r3246405372
##########
datafusion/datasource-parquet/src/source.rs:
##########
@@ -815,17 +975,53 @@ impl FileSource for ParquetSource {
new
};
- // Check if the reversed ordering satisfies the requested ordering
- if !reversed_eq_properties.ordering_satisfy(order.iter().cloned())? {
- return Ok(SortOrderPushdownResult::Unsupported);
+ // Check if the reversed ordering satisfies the requested ordering.
+ // If yes, this is the "Inexact via row-group reversal" case: source
+ // declares ASC ordering, request is DESC (or vice versa), so iterating
+ // RGs in reverse approximates the requested order.
+ if reversed_eq_properties.ordering_satisfy(order.iter().cloned())? {
+ let sort_order = LexOrdering::new(order.iter().cloned());
+ let mut new_source = self.clone().with_reverse_row_groups(true);
+ new_source.sort_order_for_reorder = sort_order;
+ return Ok(SortOrderPushdownResult::Inexact {
+ inner: Arc::new(new_source) as Arc<dyn FileSource>,
+ });
Review Comment:
Took a closer look while refactoring — partially merged, but kept a narrow
fallback.
- **Plain `Column` leading sort**: merged into the stats-based path as you
suggested. The column-in-schema branch now sets both `sort_order_for_reorder`
and `reverse_row_groups` (the latter taken from the
request's direction, which also fixes a latent source-DESC + request-ASC
mismatch).
- **Function-wrapped sort** (e.g. `date_trunc('day', ts) DESC`,
`ceil(value) DESC`): kept as a reverse-only fallback. `reorder_by_statistics`
needs a plain column name to look up min/max in parquet metadata, so
it can't subsume these. The fallback only sets `reverse_row_groups=true`
and no longer wastes a `sort_order_for_reorder` hint that would just be skipped.
Rationale spelled out in the new doc on `try_pushdown_sort`. Open to
dropping the fallback entirely if you'd rather not pay that complexity for
function-wrapped sorts — let me know.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]