Re: [PR] [multistage] Add configurable hash function support for KeySelector in query planning [pinot]

via GitHub Thu, 03 Jul 2025 15:42:22 -0700


ankitsultana commented on code in PR #16242:
URL: https://github.com/apache/pinot/pull/16242#discussion_r2183861550



##########
pinot-query-planner/src/main/java/org/apache/pinot/query/planner/logical/RelToPlanNodeConverter.java:
##########
@@ -94,9 +95,16 @@ public final class RelToPlanNodeConverter {
   private boolean _windowFunctionFound;
   @Nullable
   private final TransformationTracker.Builder<PlanNode, RelNode> _tracker;
+  private final String _hashFunction;
 
   public RelToPlanNodeConverter(@Nullable 
TransformationTracker.Builder<PlanNode, RelNode> tracker) {

Review Comment:
   isn't this unused?



##########
pinot-query-planner/src/main/java/org/apache/pinot/query/planner/plannode/MailboxSendNode.java:
##########
@@ -52,22 +54,38 @@ private MailboxSendNode(int stageId, DataSchema dataSchema, 
List<PlanNode> input
     _prePartitioned = prePartitioned;
     _collations = collations != null ? collations : List.of();
     _sort = sort;
+    _hashFunction = hashFunction;
   }
 
   public MailboxSendNode(int stageId, DataSchema dataSchema, List<PlanNode> 
inputs,
       @Nullable List<Integer> receiverStages, PinotRelExchangeType 
exchangeType,
       RelDistribution.Type distributionType, @Nullable List<Integer> keys, 
boolean prePartitioned,
       @Nullable List<RelFieldCollation> collations, boolean sort) {
     this(stageId, dataSchema, inputs, toBitSet(receiverStages), exchangeType,
-        distributionType, keys, prePartitioned, collations, sort);
+        distributionType, keys, prePartitioned, collations, sort, 
KeySelector.DEFAULT_HASH_ALGORITHM);
   }
 
   public MailboxSendNode(int stageId, DataSchema dataSchema, List<PlanNode> 
inputs,
       int receiverStage, PinotRelExchangeType exchangeType,
       RelDistribution.Type distributionType, @Nullable List<Integer> keys, 
boolean prePartitioned,
       @Nullable List<RelFieldCollation> collations, boolean sort) {
     this(stageId, dataSchema, inputs, toBitSet(receiverStage), exchangeType, 
distributionType, keys, prePartitioned,
-        collations, sort);
+        collations, sort, KeySelector.DEFAULT_HASH_ALGORITHM);
+  }
+
+  public MailboxSendNode(int stageId, DataSchema dataSchema, List<PlanNode> 
inputs,
+      int receiverStage, PinotRelExchangeType exchangeType,
+      RelDistribution.Type distributionType, @Nullable List<Integer> keys, 
boolean prePartitioned,
+      @Nullable List<RelFieldCollation> collations, boolean sort, String 
hashFunction) {
+    this(stageId, dataSchema, inputs, toBitSet(receiverStage), exchangeType, 
distributionType, keys, prePartitioned,
+        collations, sort, hashFunction);
+  }
+
+  public MailboxSendNode(int stageId, DataSchema dataSchema, List<PlanNode> 
inputs,

Review Comment:
   there are too many ctors here. this one is only used in a test file. Can you 
instead update that file to pass in the default hash algorithm?



##########
pinot-query-planner/src/main/java/org/apache/pinot/query/planner/plannode/MailboxSendNode.java:
##########
@@ -52,22 +54,38 @@ private MailboxSendNode(int stageId, DataSchema dataSchema, 
List<PlanNode> input
     _prePartitioned = prePartitioned;
     _collations = collations != null ? collations : List.of();
     _sort = sort;
+    _hashFunction = hashFunction;
   }
 
   public MailboxSendNode(int stageId, DataSchema dataSchema, List<PlanNode> 
inputs,
       @Nullable List<Integer> receiverStages, PinotRelExchangeType 
exchangeType,
       RelDistribution.Type distributionType, @Nullable List<Integer> keys, 
boolean prePartitioned,
       @Nullable List<RelFieldCollation> collations, boolean sort) {
     this(stageId, dataSchema, inputs, toBitSet(receiverStages), exchangeType,
-        distributionType, keys, prePartitioned, collations, sort);
+        distributionType, keys, prePartitioned, collations, sort, 
KeySelector.DEFAULT_HASH_ALGORITHM);
   }
 
   public MailboxSendNode(int stageId, DataSchema dataSchema, List<PlanNode> 
inputs,

Review Comment:
   This is one of the risk with having too many ctors. This one is used in 
PlanFragmenter in visitExchange. You can pass in the hash function there by 
adding `node.getHashFunction()`



##########
pinot-query-planner/src/main/java/org/apache/pinot/query/planner/plannode/MailboxSendNode.java:
##########
@@ -52,22 +54,38 @@ private MailboxSendNode(int stageId, DataSchema dataSchema, 
List<PlanNode> input
     _prePartitioned = prePartitioned;
     _collations = collations != null ? collations : List.of();
     _sort = sort;
+    _hashFunction = hashFunction;
   }
 
   public MailboxSendNode(int stageId, DataSchema dataSchema, List<PlanNode> 
inputs,

Review Comment:
   This one is used in `PlanNodeDeserializer`. You need to update 
`MailboxSendNode` in the proto file to take in a hash function and pass it on 
there.



##########
pinot-query-planner/src/main/java/org/apache/pinot/query/planner/plannode/MailboxSendNode.java:
##########
@@ -52,22 +54,38 @@ private MailboxSendNode(int stageId, DataSchema dataSchema, 
List<PlanNode> input
     _prePartitioned = prePartitioned;
     _collations = collations != null ? collations : List.of();
     _sort = sort;
+    _hashFunction = hashFunction;
   }
 
   public MailboxSendNode(int stageId, DataSchema dataSchema, List<PlanNode> 
inputs,
       @Nullable List<Integer> receiverStages, PinotRelExchangeType 
exchangeType,
       RelDistribution.Type distributionType, @Nullable List<Integer> keys, 
boolean prePartitioned,
       @Nullable List<RelFieldCollation> collations, boolean sort) {
     this(stageId, dataSchema, inputs, toBitSet(receiverStages), exchangeType,
-        distributionType, keys, prePartitioned, collations, sort);
+        distributionType, keys, prePartitioned, collations, sort, 
KeySelector.DEFAULT_HASH_ALGORITHM);
   }
 
   public MailboxSendNode(int stageId, DataSchema dataSchema, List<PlanNode> 
inputs,

Review Comment:
   You can leave `PlanFragmentAndMailboxAssignment` be though. I'll take care 
of that one (there's a few more things we need to do with v2 optimizer)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Re: [PR] [multistage] Add configurable hash function support for KeySelector in query planning [pinot]

Reply via email to