Rohini Palaniswamy created PIG-3876:
---------------------------------------
Summary: Handle two outputs from split going to same input in
MultiQueryOptimizer
Key: PIG-3876
URL: https://issues.apache.org/jira/browse/PIG-3876
Project: Pig
Issue Type: Sub-task
Reporter: Rohini Palaniswamy
Fix For: tez-branch
MultiQueryOptimizerTez.java
{code}
// Detect diamond shape, we cannot merge it into split, since Tez
// does not handle double edge between vertexes
boolean sharedSucc = false;
if (getPlan().getSuccessors(successor)!=null) {
for (TezOperator succ_successor :
getPlan().getSuccessors(successor)) {
if (succ_successors.contains(succ_successor)) {
sharedSucc = true;
break;
}
}
succ_successors.addAll(getPlan().getSuccessors(successor));
}
{code}
SPLIT A INTO B if <condition>, C if <condition>;
D = JOIN B by x, C by x;
We would like to do
V1 - Split (B -> V2, C -> V2)
V2 - Join B and C
Without the check for shared successors, above plan is created but B and C
create two separate edges between V1 and V2 which is not supported by Tez.
Since the splits are not merged into POSplit fully, we currently have
V1 - Split ( B-> V2, C-> V3 with just POValueOutputTez)
V2 - LocalRearrange and -> V4
V3 - LocalRearrange and -> V4
V4 - Join B and C
We need to remove the check and merge them into the POSplit and fix this case
to make B and C both write to same edge. Being more aggressive in multi-query
increases performance.
--
This message was sent by Atlassian JIRA
(v6.2#6252)