[GitHub] [pinot] xiangfu0 commented on a diff in pull request #10845: [Multistage] Runtime changes for leveraging V1 Aggregation Functions

via GitHub Mon, 26 Jun 2023 12:11:44 -0700


xiangfu0 commented on code in PR #10845:
URL: https://github.com/apache/pinot/pull/10845#discussion_r1242589834



##########
pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/MaxAggregationFunction.java:
##########
@@ -295,6 +295,22 @@ public Double merge(Double intermediateMaxResult1, Double 
intermediateMaxResult2
     return intermediateMaxResult2;
   }
 
+  @Override
+  public void mergeAndUpdateResultHolder(Double intermediateResult,

Review Comment:
   Shall we make those default implementation for the base class and put 
unsupported and TODO for the implementations won't honor these two methods?



##########
pinot-common/src/main/java/org/apache/pinot/common/request/context/IdentifierContext.java:
##########
@@ -0,0 +1,75 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.common.request.context;
+
+import java.util.Objects;
+import org.apache.pinot.common.utils.DataSchema;
+
+/**
+ * The {@code IdentifierContext} class represents a Identifer in the query.
+ * <p> This class includes information that about an identifier. The v1 engine 
uses column names for identifiers. The
+ * multistage query engine uses ordinals to distinctly track each identifier. 
So this context is set up to support both
+ * v1 and multistage engine identifiers.
+ */
+public class IdentifierContext {
+  private DataSchema.ColumnDataType _dataType;
+  String _name;
+  // Identifier Index is needed because multistage engine tracks identifiers 
with the index(ordinal) position.
+  int _identifierIndex;
+
+  public IdentifierContext(String name, DataSchema.ColumnDataType dataType, 
int identifierIndex) {
+    _name = name;
+    _dataType = dataType;
+    _identifierIndex = identifierIndex;
+  }
+
+  public String getName() {
+    return _name;
+  }
+
+  public DataSchema.ColumnDataType getDataType() {
+    return _dataType;
+  }
+
+  public int getIndex() {
+    return _identifierIndex;
+  }
+
+  @Override
+  public boolean equals(Object o) {
+    if (this == o) {
+      return true;
+    }
+    if (!(o instanceof IdentifierContext)) {
+      return false;
+    }
+    IdentifierContext that = (IdentifierContext) o;
+    return Objects.equals(_name, that._name) && 
Objects.equals(_identifierIndex, that._identifierIndex);
+  }
+
+  @Override
+  public int hashCode() {
+    return Objects.hash(_name, _identifierIndex);
+  }
+
+  @Override
+  public String toString() {
+    return _name;

Review Comment:
   Where is this used? Shall we assemble both _name and _identifierIndex or 
even the _dataType here?



##########
pinot-query-planner/src/main/java/org/apache/calcite/rel/hint/PinotHintStrategyTable.java:
##########
@@ -29,6 +29,9 @@
 public class PinotHintStrategyTable {
   public static final String INTERNAL_AGG_INTERMEDIATE_STAGE = 
"aggIntermediateStage";
   public static final String INTERNAL_AGG_FINAL_STAGE = "aggFinalStage";
+  // Hint to denote that aggregation is to be run as a single stage rather 
than split into an intermediate and a final
+  // stage
+  public static final String INTERNAL_IS_SINGLE_STAGE_AGG = "isSingleStageAgg";

Review Comment:
   Is this hint required to be set explicitly from external users or we are 
using it internally always?
   If later, I think we can add a parameter to the Agg PlanNode instead of 
adding a hint?



##########
pinot-query-planner/src/main/java/org/apache/pinot/query/planner/plannode/AggregateNode.java:
##########
@@ -44,9 +44,11 @@ public AggregateNode(int planFragmentId, DataSchema 
dataSchema, List<AggregateCa
       List<RexExpression> groupSet,
       List<RelHint> relHints) {
     super(planFragmentId, dataSchema);
-    _aggCalls = 
aggCalls.stream().map(RexExpression::toRexExpression).collect(Collectors.toList());
+
+
     _groupSet = groupSet;
     _relHints = relHints;
+    _aggCalls = 
aggCalls.stream().map(RexExpression::toRexExpression).collect(Collectors.toList());

Review Comment:
   why changed the ordering?



##########
pinot-core/src/main/java/org/apache/pinot/core/common/IntermediateStageBlockValSet.java:
##########
@@ -0,0 +1,250 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.common;
+
+import java.math.BigDecimal;
+import java.util.List;
+import javax.annotation.Nullable;
+import org.apache.pinot.common.utils.DataSchema;
+import org.apache.pinot.common.utils.PinotDataType;
+import org.apache.pinot.segment.spi.index.reader.Dictionary;
+import org.apache.pinot.spi.data.FieldSpec;
+import org.roaringbitmap.RoaringBitmap;
+
+/**
+ * In the multistage engine, the leaf stage servers process the data in 
columnar fashion. By the time the
+ * intermediate stage receives the projected column, they are converted to a 
row based format. This class provides
+ * the capability to convert the row based represenation into blocks so that 
they can be used to process
+ * aggregations.
+ * TODO: Support MV
+ */
+public class IntermediateStageBlockValSet implements BlockValSet {
+  private final FieldSpec.DataType _dataType;
+  private final PinotDataType _pinotDataType;
+  private final List<Object> _values;
+  private final RoaringBitmap _nullBitMap;
+  private boolean _nullBitMapSet;

Review Comment:
   why have this boolean instead of set `_nullBitMap` to null and construct in 
`getNullBitmap` method? 



##########
pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/AggFunctionQueryContext.java:
##########
@@ -0,0 +1,62 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.query.aggregation.function;
+
+import java.util.List;
+import org.apache.pinot.common.request.context.OrderByExpressionContext;
+import org.apache.pinot.core.query.request.context.QueryContext;
+
+
+/**
+ * The <code>AggFunctionQueryContext</code> class contains extracted details 
from QueryContext that can be used for
+ * Aggregation Functions.
+ */
+public class AggFunctionQueryContext {
+  private boolean _isNullHandlingEnabled;

Review Comment:
   Make those member variables final?



##########
pinot-core/src/main/java/org/apache/pinot/core/common/IntermediateStageBlockValSet.java:
##########
@@ -0,0 +1,250 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.common;
+
+import java.math.BigDecimal;
+import java.util.List;
+import javax.annotation.Nullable;
+import org.apache.pinot.common.utils.DataSchema;
+import org.apache.pinot.common.utils.PinotDataType;
+import org.apache.pinot.segment.spi.index.reader.Dictionary;
+import org.apache.pinot.spi.data.FieldSpec;
+import org.roaringbitmap.RoaringBitmap;
+
+/**
+ * In the multistage engine, the leaf stage servers process the data in 
columnar fashion. By the time the
+ * intermediate stage receives the projected column, they are converted to a 
row based format. This class provides
+ * the capability to convert the row based represenation into blocks so that 
they can be used to process
+ * aggregations.
+ * TODO: Support MV
+ */
+public class IntermediateStageBlockValSet implements BlockValSet {
+  private final FieldSpec.DataType _dataType;
+  private final PinotDataType _pinotDataType;
+  private final List<Object> _values;
+  private final RoaringBitmap _nullBitMap;
+  private boolean _nullBitMapSet;
+
+  public IntermediateStageBlockValSet(DataSchema.ColumnDataType 
columnDataType, List<Object> values) {

Review Comment:
   +1 to what @walterddr mentioned.
   I think we can have two constructors:
   1. `IntermediateStageBlockValSet(DataSchema.ColumnDataType columnDataType, 
List<Object> values)`, which is the current setup, that is a list of boxed java 
object, so we have the solution to solve all the scenarios both simple and 
complex data types. 
   2. Another constructor could be: 
`IntermediateStageBlockValSet(DataSchema.ColumnDataType columnDataType, Object 
values)`, in this case, values is instance of `int[]` when ColumnDataType is 
int.
   
   Agreed with you that this could be further optimization, which we can add 
TODO here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

[GitHub] [pinot] xiangfu0 commented on a diff in pull request #10845: [Multistage] Runtime changes for leveraging V1 Aggregation Functions

Reply via email to