WeisonWei opened a new issue, #52745:
URL: https://github.com/apache/doris/issues/52745

   ### Search before asking
   
   - [x] I had searched in the 
[issues](https://github.com/apache/doris/issues?q=is%3Aissue) and found no 
similar issues.
   
   
   ### Version
   
   2.1.x, 3.0.x
   
   ### What's Wrong?
   
   When querying external tables (Hive, Hudi, Iceberg), all WHERE clause filter 
conditions are completely lost during query execution. The 
`PhysicalPlanTranslator.getPlanFragmentForPhysicalFileScan()` method lacks 
conjuncts (filter conditions) handling, causing external table queries to 
return unfiltered data instead of applying WHERE conditions.
   
   **Root Cause:**
   In `PhysicalPlanTranslator.getPlanFragmentForPhysicalFileScan()`, the method 
is missing:
   1. `scanNode.addConjuncts()` call to transfer filter conditions
   2. `getConjunctsWithoutPartitionPredicate()` method for proper Hive 
partition predicate separation
   
   ### What You Expected?
   
   External table queries with WHERE clauses should:
   1. Apply filter conditions correctly
   2. Return only filtered data
   3. Maintain proper partition pruning for Hive tables
   4. Support all external table types (Hive, Hudi, Iceberg)
   
   ### How to Reproduce?
   
   
   1. **Create any external table (Hive example):**
   ```sql
   CREATE CATALOG hive_catalog PROPERTIES (
       "type"="hms",
       "hive.metastore.uris" = "thrift://hive-metastore:9083"
   );
   ```
   
   2. **Query with WHERE conditions:**
   ```sql
   -- This query should return filtered data but returns ALL data
   SELECT * FROM hive_catalog.db.table 
   WHERE partition_col = '2024-01-01' 
     AND regular_col > 100;
   ```
   
   3. **Observe the issue:**
      - **Expected**: Only rows matching the WHERE conditions
      - **Actual**: All rows returned (filters completely ignored)
   
   **Verification:**
   ```sql
   -- These queries return the same result count (proving filters are ignored)
   SELECT COUNT(*) FROM hive_catalog.db.table WHERE col > 100;  -- Returns 
total count
   SELECT COUNT(*) FROM hive_catalog.db.table;                  -- Returns same 
total count
   ```
   
   
   ### Anything Else?
   
   
   **Impact:**
   - **Query Correctness**: External table queries return wrong results
   - **Performance**: Full table scans instead of filtered scans
   - **Resource Usage**: Unnecessary data transfer and processing
   - **Scope**: Affects ALL external table types with filter conditions
   
   **Technical Analysis:**
   Current broken code in 
`PhysicalPlanTranslator.getPlanFragmentForPhysicalFileScan()`:
   ```java
   private PlanFragment getPlanFragmentForPhysicalFileScan(...) {
       scanNode.setNereidsId(fileScan.getId());
       context.getNereidsIdToPlanNodeIdMap().put(fileScan.getId(), 
scanNode.getId());
       // ❌ MISSING: No conjuncts handling - filters are lost!
       
scanNode.setPushDownAggNoGrouping(context.getRelationPushAggOp(fileScan.getRelationId()));
       // ... rest of method
   }
   ```
   
   **Required Fix:**
   ```java
   private PlanFragment getPlanFragmentForPhysicalFileScan(...) {
       scanNode.setNereidsId(fileScan.getId());
       context.getNereidsIdToPlanNodeIdMap().put(fileScan.getId(), 
scanNode.getId());
       // ✅ ADD: Conjuncts handling with partition predicate separation
       
scanNode.addConjuncts(translateToLegacyConjuncts(getConjunctsWithoutPartitionPredicate(fileScan)));
       
scanNode.setPushDownAggNoGrouping(context.getRelationPushAggOp(fileScan.getRelationId()));
       // ... rest of method
   }
   ```
   
   **Affected Components:**
   - PhysicalPlanTranslator
   - External table scanning (Hive, Hudi, Iceberg)
   - Query execution engine
   - Partition pruning (for Hive tables)
   
   **Verified Across Branches:**
   - ❌ master - Missing conjuncts handling
   - ❌ branch-3.0 - Missing conjuncts handling  
   - ❌ branch-2.1 - Missing conjuncts handling
   
   ### Are you willing to submit PR?
   
   - [x] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to