morrySnow commented on PR #61146:
URL: https://github.com/apache/doris/pull/61146#issuecomment-4132263710
## Concrete Fix Suggestions for GraphSimplifier Issues
### Fix #1: `constructEdge()` — use `getOtherJoinConjuncts()` for
`otherConditions`
```java
// Before (line ~533):
List<Expression> otherConditions = connectionEdges.stream()
.mapToObj(i ->
graph.getJoinEdge(i).getJoin().getHashJoinConjuncts()) // BUG
.flatMap(Collection::stream)
.collect(Collectors.toList());
// After:
List<Expression> otherConditions = connectionEdges.stream()
.mapToObj(i ->
graph.getJoinEdge(i).getJoin().getOtherJoinConjuncts()) // FIX
.flatMap(Collection::stream)
.collect(Collectors.toList());
```
---
### Fix #2: `updatePQ()` — always re-heapify when benefit changes
The old `hypergraph/GraphSimplifier.java:330` does `remove()+add()` when the
element is already queued. The new v2 skips this. Fix:
```java
// Before:
private void updatePQ(int edgeIdx) {
BestSimplification cacheNode = simplifications.get(edgeIdx);
Preconditions.checkState(!Double.isNaN(cacheNode.bestStep.benefit));
if (!cacheNode.isInPriorityQueue) {
if (cacheNode.bestNeighbor != -1) {
priorityQueue.add(cacheNode);
cacheNode.isInPriorityQueue = true;
}
} else {
if (cacheNode.bestNeighbor == -1) {
priorityQueue.remove(cacheNode);
cacheNode.isInPriorityQueue = false;
}
}
}
// After:
private void updatePQ(int edgeIdx) {
BestSimplification cacheNode = simplifications.get(edgeIdx);
Preconditions.checkState(!Double.isNaN(cacheNode.bestStep.benefit));
if (!cacheNode.isInPriorityQueue) {
if (cacheNode.bestNeighbor != -1) {
priorityQueue.add(cacheNode);
cacheNode.isInPriorityQueue = true;
}
} else {
// Always remove first, then re-add if still valid — ensures heap
order
priorityQueue.remove(cacheNode);
if (cacheNode.bestNeighbor == -1) {
cacheNode.isInPriorityQueue = false;
} else {
priorityQueue.add(cacheNode);
}
}
}
```
---
### Fix #3: `threeRightJoin()` — use `edge2` for inner-join cost
```java
// Before (line ~629):
calCost(edge1, bitmap2, bitmap3);
// After:
calCost(edge2, bitmap2, bitmap3);
```
Rationale: `threeRightJoin` computes `bitmap1 ⋈edge1 (bitmap2 ⋈edge2
bitmap3)`. The inner sub-join `(bitmap2 ⋈ bitmap3)` uses `edge2`, so cost
should use `edge2` as well. `threeLeftJoin` is already correct and consistent.
---
### Fix #8/#9: `calCost()` — operator precedence + null safety
```java
// Before (lines ~580-584):
if (JoinUtils.shouldNestedLoopJoin(join)) {
cost = cacheCost.get(leftBitmap) + cacheCost.get(rightBitmap)
+ rightStats.getRowCount() + 1 / leftStats.getRowCount();
} else {
cost = cacheCost.get(leftBitmap) + cacheCost.get(rightBitmap)
+ (rightStats.getRowCount() + 1 / leftStats.getRowCount()) * 1.2;
}
// After — fix precedence, guard against zero/null:
double leftRows = leftStats != null ? Math.max(1.0, leftStats.getRowCount())
: 1.0;
double rightRows = rightStats != null ? Math.max(1.0,
rightStats.getRowCount()) : 1.0;
double leftCost = cacheCost.getOrDefault(leftBitmap, leftRows);
double rightCost = cacheCost.getOrDefault(rightBitmap, rightRows);
if (JoinUtils.shouldNestedLoopJoin(join)) {
cost = leftCost + rightCost + rightRows + leftRows; // or intended
formula
} else {
cost = leftCost + rightCost + (rightRows + leftRows) * 1.2;
}
```
Note: The original `1 / leftStats.getRowCount()` is likely an
operator-precedence bug — Java evaluates it as `1.0 / rows` (reciprocal), not
`(rightRows + 1) / leftRows`. Please clarify the intended cost formula. Using
`getOrDefault` prevents NPE from auto-unboxing null `Double`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]