WeisonWei opened a new pull request, #52746:
URL: https://github.com/apache/doris/pull/52746

   # fix(external): add missing conjuncts handling in PhysicalFileScan for 
external table queries
   
   ## What problem does this PR solve?
   
   Issue Number: close #52745
   
   Related PR: N/A
   
   **Problem Summary:**
   
   Critical bug where 
`PhysicalPlanTranslator.getPlanFragmentForPhysicalFileScan()` completely lacks 
conjuncts (filter conditions) handling, causing all WHERE clause filters to be 
lost during external table query execution across Hive, Hudi, and Iceberg 
tables.
   
   **Root Cause:**
   - Missing `scanNode.addConjuncts()` call in 
`getPlanFragmentForPhysicalFileScan()` method
   - Missing `getConjunctsWithoutPartitionPredicate()` method for proper Hive 
partition predicate separation
   - Affects ALL external table queries with filter conditions
   
   **Impact:**
   - **Query Correctness**: External table queries return unfiltered data 
instead of applying WHERE conditions
   - **Performance**: Full table scans instead of filtered scans
   - **Resource Usage**: Unnecessary data transfer and processing
   
   ## What is changed and how it works?
   
   **Changes:**
   
   1. **Added missing conjuncts handling:**
   ```java
   // BEFORE (Broken - filters lost):
   private PlanFragment getPlanFragmentForPhysicalFileScan(...) {
       scanNode.setNereidsId(fileScan.getId());
       context.getNereidsIdToPlanNodeIdMap().put(fileScan.getId(), 
scanNode.getId());
       // ❌ Missing conjuncts handling
   }
   
   // AFTER (Fixed - filters properly transferred):
   private PlanFragment getPlanFragmentForPhysicalFileScan(...) {
       scanNode.setNereidsId(fileScan.getId());
       context.getNereidsIdToPlanNodeIdMap().put(fileScan.getId(), 
scanNode.getId());
       // ✅ Added conjuncts handling
       
scanNode.addConjuncts(translateToLegacyConjuncts(getConjunctsWithoutPartitionPredicate(fileScan)));
   }
   ```
   
   2. **Added `getConjunctsWithoutPartitionPredicate()` method:**
      - **For Hive tables**: Separates partition predicates from regular 
predicates using `PartitionPruneExpressionExtractor.ExpressionEvaluableDetector`
      - **For Hudi/Iceberg/other tables**: Returns all conjuncts as-is
      - **Purpose**: Prevents incorrect partition predicate pushdown for Hive 
tables
   
   3. **Added required imports:**
      - `PartitionPruneExpressionExtractor` for partition predicate detection
      - `java.util.function.Function` for stream operations
   
   **How it works:**
   - For Hive tables: Identifies partition columns and filters out expressions 
that only reference partition columns
   - For other table types: Uses all conjuncts directly
   - Ensures proper filter conditions are transferred from PhysicalFileScan to 
ScanNode
   
   ## Release note
   
   Fix critical external table query bug where WHERE clause filters were 
completely lost during query execution. Added missing conjuncts handling in 
PhysicalPlanTranslator.getPlanFragmentForPhysicalFileScan() with proper 
partition predicate separation for Hive tables.
   
   ## Check List (For Author)
   
   - Test
     - [x] Regression test
     - [x] Unit Test
     - [x] Manual test (add detailed scripts or steps below)
     - [ ] No need to test or manual test. Explain why:
   
   **Unit Tests Added:**
   - `testGetConjunctsWithoutPartitionPredicate()`: Tests partition predicate 
separation for Hive tables
   - Validates that non-partition predicates are correctly preserved
   - Confirms other table types return all conjuncts unchanged
   
   **Manual Test Steps:**
   ```sql
   -- 1. Create external table with filter conditions
   CREATE CATALOG hive_catalog PROPERTIES (
       "type"="hms",
       "hive.metastore.uris" = "thrift://hive-metastore:9083"
   );
   
   -- 2. Test filter conditions work correctly
   SELECT COUNT(*) FROM hive_catalog.db.table WHERE col > 100;
   -- Expected: Should return filtered count, not total count
   
   -- 3. Test partition pruning for Hive tables
   SELECT * FROM hive_catalog.db.partitioned_table 
   WHERE partition_col = '2024-01-01' AND regular_col > 100;
   -- Expected: Should apply both partition and regular filters correctly
   ```
   
   - Behavior changed:
     - [x] Yes. External table queries now correctly apply WHERE clause filters 
instead of ignoring them.
   
   - Does this need documentation?
     - [ ] No. This is a bug fix that restores expected behavior.
     - [ ] Yes.
   
   ## Check List (For Reviewer who merge this PR)
   
   - [ ] Confirm the release note
   - [ ] Confirm test cases  
   - [ ] Confirm document
   - [ ] Add branch pick label
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to