wenzhenghu opened a new pull request, #61929:
URL: https://github.com/apache/doris/pull/61929

   ### What problem does this PR solve?
   
   Problem Summary:
   When querying a Paimon table (or other external tables) with a condition 
that cannot be pushed down (e.g., `LIKE '%*%'`), if a single `FileScanner` 
instance processes both Native splits (Parquet/ORC) and JNI splits 
consecutively, the data returned by the JNI reader will skip the fallback 
filtering at the `Scanner` layer, resulting in dirty data leaking into the 
final result.
   
   Root Cause:
   1. When `FileScanner` prepares to read a Native split, it calls 
`_process_late_arrival_conjuncts()` to assign `_conjuncts` into 
`_push_down_conjuncts`.
   2. However, it mistakenly called `_conjuncts.clear()` at the end of this 
logic, wiping out the shared fallback `_conjuncts` at the scanner level.
   3. When the scanner subsequently processes a JNI split (which does not 
trigger the push-down logic), `Scanner::_filter_output_block()` finds 
`_conjuncts` is empty, causing predicates like `LIKE` to be completely bypassed.
   
   Solution:
   Remove the `_conjuncts.clear()` call in `_process_late_arrival_conjuncts()`. 
This ensures that `_conjuncts` is always retained as the final fallback filter 
at the `Scanner` level, regardless of how underlying readers execute their own 
push-down predicates.
   
   Added a BE unit test `process_late_arrival_conjuncts_retain` to prevent 
regression.
   
   ### Release note
   
   Fix a correctness issue where complex string predicates (like `LIKE`) might 
fail to filter dirty data when querying external tables with mixed native and 
JNI splits.
   
   
   ### Check List (For Author)
   
   - Test <!-- At least one of them must be included. -->
       - [x] Regression test
       - [x] Unit Test
       - [ ] Manual test (add detailed scripts or steps below)
       - [ ] No need to test or manual test. Explain why:
           - [ ] This is a refactor/code format and no logic has been changed.
           - [ ] Previous test can cover this change.
           - [ ] No code files have been changed.
           - [ ] Other reason <!-- Add your reason?  -->
   
   - Behavior changed:
       - [x] No.
       - [ ] Yes. <!-- Explain the behavior change -->
   
   - Does this need documentation?
       - [x] No.
       - [ ] Yes. <!-- Add document PR link here. eg: 
https://github.com/apache/doris-website/pull/1214 -->
   
   ### Check List (For Reviewer who merge this PR)
   
   - [ ] Confirm the release note
   - [ ] Confirm test cases
   - [ ] Confirm document
   - [ ] Add branch pick label <!-- Add branch pick label that this PR should 
merge into -->
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to