morrySnow commented on PR #61146:
URL: https://github.com/apache/doris/pull/61146#issuecomment-4132263710

   ## Concrete Fix Suggestions for GraphSimplifier Issues
   
   ### Fix #1: `constructEdge()` — use `getOtherJoinConjuncts()` for 
`otherConditions`
   
   ```java
   // Before (line ~533):
   List<Expression> otherConditions = connectionEdges.stream()
           .mapToObj(i -> 
graph.getJoinEdge(i).getJoin().getHashJoinConjuncts())  // BUG
           .flatMap(Collection::stream)
           .collect(Collectors.toList());
   
   // After:
   List<Expression> otherConditions = connectionEdges.stream()
           .mapToObj(i -> 
graph.getJoinEdge(i).getJoin().getOtherJoinConjuncts())  // FIX
           .flatMap(Collection::stream)
           .collect(Collectors.toList());
   ```
   
   ---
   
   ### Fix #2: `updatePQ()` — always re-heapify when benefit changes
   
   The old `hypergraph/GraphSimplifier.java:330` does `remove()+add()` when the 
element is already queued. The new v2 skips this. Fix:
   
   ```java
   // Before:
   private void updatePQ(int edgeIdx) {
       BestSimplification cacheNode = simplifications.get(edgeIdx);
       Preconditions.checkState(!Double.isNaN(cacheNode.bestStep.benefit));
       if (!cacheNode.isInPriorityQueue) {
           if (cacheNode.bestNeighbor != -1) {
               priorityQueue.add(cacheNode);
               cacheNode.isInPriorityQueue = true;
           }
       } else {
           if (cacheNode.bestNeighbor == -1) {
               priorityQueue.remove(cacheNode);
               cacheNode.isInPriorityQueue = false;
           }
       }
   }
   
   // After:
   private void updatePQ(int edgeIdx) {
       BestSimplification cacheNode = simplifications.get(edgeIdx);
       Preconditions.checkState(!Double.isNaN(cacheNode.bestStep.benefit));
       if (!cacheNode.isInPriorityQueue) {
           if (cacheNode.bestNeighbor != -1) {
               priorityQueue.add(cacheNode);
               cacheNode.isInPriorityQueue = true;
           }
       } else {
           // Always remove first, then re-add if still valid — ensures heap 
order
           priorityQueue.remove(cacheNode);
           if (cacheNode.bestNeighbor == -1) {
               cacheNode.isInPriorityQueue = false;
           } else {
               priorityQueue.add(cacheNode);
           }
       }
   }
   ```
   
   ---
   
   ### Fix #3: `threeRightJoin()` — use `edge2` for inner-join cost
   
   ```java
   // Before (line ~629):
   calCost(edge1, bitmap2, bitmap3);
   
   // After:
   calCost(edge2, bitmap2, bitmap3);
   ```
   
   Rationale: `threeRightJoin` computes `bitmap1 ⋈edge1 (bitmap2 ⋈edge2 
bitmap3)`. The inner sub-join `(bitmap2 ⋈ bitmap3)` uses `edge2`, so cost 
should use `edge2` as well. `threeLeftJoin` is already correct and consistent.
   
   ---
   
   ### Fix #8/#9: `calCost()` — operator precedence + null safety
   
   ```java
   // Before (lines ~580-584):
   if (JoinUtils.shouldNestedLoopJoin(join)) {
       cost = cacheCost.get(leftBitmap) + cacheCost.get(rightBitmap)
               + rightStats.getRowCount() + 1 / leftStats.getRowCount();
   } else {
       cost = cacheCost.get(leftBitmap) + cacheCost.get(rightBitmap)
               + (rightStats.getRowCount() + 1 / leftStats.getRowCount()) * 1.2;
   }
   
   // After — fix precedence, guard against zero/null:
   double leftRows = leftStats != null ? Math.max(1.0, leftStats.getRowCount()) 
: 1.0;
   double rightRows = rightStats != null ? Math.max(1.0, 
rightStats.getRowCount()) : 1.0;
   double leftCost = cacheCost.getOrDefault(leftBitmap, leftRows);
   double rightCost = cacheCost.getOrDefault(rightBitmap, rightRows);
   if (JoinUtils.shouldNestedLoopJoin(join)) {
       cost = leftCost + rightCost + rightRows + leftRows;  // or intended 
formula
   } else {
       cost = leftCost + rightCost + (rightRows + leftRows) * 1.2;
   }
   ```
   
   Note: The original `1 / leftStats.getRowCount()` is likely an 
operator-precedence bug — Java evaluates it as `1.0 / rows` (reciprocal), not 
`(rightRows + 1) / leftRows`. Please clarify the intended cost formula. Using 
`getOrDefault` prevents NPE from auto-unboxing null `Double`.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to