cbb330 opened a new issue, #49362: URL: https://github.com/apache/arrow/issues/49362
### Summary Part 3 of ORC predicate pushdown (#48986). Depends on #49361. Extend the initial INT32/INT64 greater-than implementation to cover all comparison operators, logical operators, set operations, and null handling. ### Operator coverage The guarantee-based approach means most operators work automatically once the guarantee expression is correct. The work here is: 1. Ensuring guarantee expressions handle null semantics correctly for each operator class 2. Adding test coverage for each operator 3. Handling edge cases specific to certain operators (e.g., IN with mixed types) | Category | Operators | Notes | |----------|-----------|-------| | Comparison | `>`, `>=`, `<`, `<=`, `==`, `!=` | All work via `SimplifyWithGuarantee()` with min/max range guarantees | | Logical | `AND`, `OR`, `NOT` | Compound predicates; Arrow's simplifier handles these given correct per-field guarantees | | Set | `IN` | Range intersection: if all IN values fall outside [min, max], skip stripe | | Null | `IS NULL`, `IS NOT NULL` | Use `hasNull()` and `getNumberOfValues() == 0` from ORC stats | ### Future type extensions This sub-issue covers operators for INT32/INT64. Extending to additional types is a follow-up: | Type | Key concern | |------|------------| | DOUBLE, FLOAT | NaN in statistics makes range unusable; ±Inf are valid bounds | | STRING | ORC may truncate long strings in statistics; collation/encoding assumptions | | DATE | int32 days since epoch — straightforward | | TIMESTAMP | Unit conversion (ORC millis + sub-millis nanos → Arrow nanos) | | DECIMAL | Scale/precision must match between stats and field type | ### Tests - Each comparison operator individually (>, >=, <, <=, ==, !=) - AND compound predicate (both conditions must hold) - OR compound predicate (either condition suffices) - NOT operator - IN operator with values inside/outside stripe range - IS NULL on stripe with/without nulls - IS NOT NULL on all-null stripe - Compound: `(id > 100 AND id < 200) OR id == 500` - Unsupported type in predicate → conservative include (no skip) ### Component(s) C++ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
