[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #8503: Support single-valued BigDecimal in schema, type conversion, SQL statements and minimum set of transforms.

GitBox Tue, 26 Apr 2022 15:59:27 -0700


Jackie-Jiang commented on code in PR #8503:
URL: https://github.com/apache/pinot/pull/8503#discussion_r859177288



##########
pinot-core/src/main/java/org/apache/pinot/core/common/RowBasedBlockValueFetcher.java:
##########
@@ -62,6 +63,8 @@ private ValueFetcher createFetcher(BlockValSet blockValSet) {
           return new DoubleSingleValueFetcher(blockValSet.getDoubleValuesSV());
         case STRING:
           return new StringSingleValueFetcher(blockValSet.getStringValuesSV());
+        case BIG_DECIMAL:

Review Comment:
   (minor) Move this in front of `STRING`



##########
pinot-core/src/main/java/org/apache/pinot/core/operator/transform/function/BaseTransformFunction.java:
##########
@@ -278,7 +323,7 @@ public byte[][] transformToBytesValuesSV(ProjectionBlock 
projectionBlock) {
     Dictionary dictionary = getDictionary();
     if (dictionary != null) {
       int[] dictIds = transformToDictIdsSV(projectionBlock);
-      dictionary.readIntValues(dictIds, length, _intValuesSV);
+      dictionary.readBytesValues(dictIds, length, _byteValuesSV);

Review Comment:
   Good catch!



##########
pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/SumPrecisionAggregationFunction.java:
##########
@@ -108,6 +108,7 @@ public void aggregate(int length, AggregationResultHolder 
aggregationResultHolde
           sum = sum.add(new BigDecimal(stringValues[i]));
         }
         break;
+      case BIG_DECIMAL:

Review Comment:
   (MAJOR) We should be able to directly read big decimals (that's the main 
point of this PR). We should also modify `aggregateGroupBySV` and 
`aggregateGroupByMV` to handle `BIG_DECIMAL` type



##########
pinot-core/src/main/java/org/apache/pinot/core/query/distinct/DistinctTable.java:
##########
@@ -258,6 +258,9 @@ public byte[] toBytes()
           case DOUBLE:
             dataTableBuilder.setColumn(i, (double) values[i]);
             break;
+          case BIG_DECIMAL:
+            dataTableBuilder.setColumn(i, values[i]);

Review Comment:
   Should add an API for setting `BigDecimal`



##########
pinot-spi/src/main/java/org/apache/pinot/spi/utils/ArrayCopyUtils.java:
##########
@@ -165,4 +227,12 @@ public static void copy(byte[][] src, String[] dest, int 
length) {
       dest[i] = BytesUtils.toHexString(src[i]);
     }
   }
+
+  public static void copy(byte[][] src, BigDecimal[] dest, int length) {
+    for (int i = 0; i < length; i++) {
+      if (src[i] != null) {

Review Comment:
   (minor) This null check is redundant



##########
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/reader/ForwardIndexReader.java:
##########
@@ -258,6 +259,45 @@ default void readValuesSV(int[] docIds, int length, 
double[] values, T context)
     }
   }
 
+  /**
+   * Fills the values
+   * @param docIds Array containing the document ids to read
+   * @param length Number of values to read
+   * @param values Values to fill
+   * @param context Reader context
+   */
+  default void readValuesSV(int[] docIds, int length, BigDecimal[] values, T 
context) {

Review Comment:
   We should add raw index support to the `BIG_DECIMAL` type (we may add a todo 
here and address it in a separate PR. It is important though because in most of 
the cases it will be more efficient to store big decimal as raw)



##########
pinot-core/src/main/java/org/apache/pinot/core/operator/blocks/IntermediateResultsBlock.java:
##########
@@ -360,6 +360,9 @@ private void setDataTableColumn(ColumnDataType 
columnDataType, DataTableBuilder
       case DOUBLE:
         dataTableBuilder.setColumn(columnIndex, (double) value);
         break;
+      case BIG_DECIMAL:
+        dataTableBuilder.setColumn(columnIndex, value);

Review Comment:
   We should add a `setColumn(int colId, BigDecimal value)` into the 
`DataTableBuilder` instead of reusing the one for general objects.



##########
pinot-core/src/main/java/org/apache/pinot/core/query/reduce/RowBasedBlockValSet.java:
##########
@@ -140,6 +141,24 @@ public double[] getDoubleValuesSV() {
     return values;
   }
 
+  @Override
+  public BigDecimal[] getBigDecimalValuesSV() {
+    int length = _rows.size();
+    BigDecimal[] values = new BigDecimal[length];
+    if (_dataType.isNumeric()) {
+      for (int i = 0; i < length; i++) {
+        values[i] = (BigDecimal) _rows.get(i)[_columnIndex];

Review Comment:
   This can cause cast exception when data type is not `BIG_DECIMAL`



##########
pinot-core/src/main/java/org/apache/pinot/core/operator/transform/function/JsonExtractScalarTransformFunction.java:
##########
@@ -46,13 +50,22 @@
  * jsonExtractScalar(jsonFieldName, 'jsonPath', 'resultsType')
  * <code>jsonFieldName</code> is the Json String field/expression.
  * <code>jsonPath</code> is a JsonPath expression which used to read from JSON 
document
- * <code>results_type</code> refers to the results data type, could be INT, 
LONG, FLOAT, DOUBLE, STRING, INT_ARRAY,
- * LONG_ARRAY, FLOAT_ARRAY, DOUBLE_ARRAY, STRING_ARRAY.
+ * <code>results_type</code> refers to the results data type, could be INT, 
LONG, FLOAT, DOUBLE, BIG_DECIMAL, STRING,
+ * INT_ARRAY, LONG_ARRAY, FLOAT_ARRAY, DOUBLE_ARRAY, STRING_ARRAY.
  *
  */
 public class JsonExtractScalarTransformFunction extends BaseTransformFunction {
   public static final String FUNCTION_NAME = "jsonExtractScalar";
 
+  private static final ObjectMapper OBJECT_MAPPER_WITH_EXACT_BIG_DECIMAL = 
(new ObjectMapper())

Review Comment:
   (nit)
   ```suggestion
     private static final ObjectMapper OBJECT_MAPPER_WITH_EXACT_BIG_DECIMAL = 
new ObjectMapper()
   ```



##########
pinot-core/src/main/java/org/apache/pinot/core/query/selection/SelectionOperatorUtils.java:
##########
@@ -273,6 +274,9 @@ public static DataTable 
getDataTableFromRows(Collection<Object[]> rows, DataSche
           case DOUBLE:
             dataTableBuilder.setColumn(i, ((Number) 
columnValue).doubleValue());
             break;
+          case BIG_DECIMAL:
+            dataTableBuilder.setColumn(i, columnValue);

Review Comment:
   Add a setter for big decimal



##########
pinot-core/src/main/java/org/apache/pinot/core/common/evaluators/DefaultJsonPathEvaluator.java:
##########
@@ -38,10 +42,19 @@
 
 public final class DefaultJsonPathEvaluator implements JsonPathEvaluator {
 
+  private static final ObjectMapper OBJECT_MAPPER_WITH_EXACT_BIG_DECIMAL = 
(new ObjectMapper())

Review Comment:
   (nit)
   ```suggestion
     private static final ObjectMapper OBJECT_MAPPER_WITH_EXACT_BIG_DECIMAL = 
new ObjectMapper()
   ```



##########
pinot-core/src/main/java/org/apache/pinot/core/operator/transform/function/BaseTransformFunction.java:
##########
@@ -226,6 +246,27 @@ public double[] transformToDoubleValuesSV(ProjectionBlock 
projectionBlock) {
     return _doubleValuesSV;
   }
 
+  @Override
+  public BigDecimal[] transformToBigDecimalValuesSV(ProjectionBlock 
projectionBlock) {
+    int length = projectionBlock.getNumDocs();
+    if (_bigDecimalValuesSV == null || _bigDecimalValuesSV.length < length) {
+      _bigDecimalValuesSV = new BigDecimal[length];
+    }
+
+    Dictionary dictionary = getDictionary();
+    if (dictionary != null) {
+      int[] dictIds = transformToDictIdsSV(projectionBlock);
+      dictionary.readBigDecimalValues(dictIds, length, _bigDecimalValuesSV);
+    } else {
+      if (getResultMetadata().getDataType().getStoredType() == 
DataType.STRING) {

Review Comment:
   Should we also support reading big decimal from other numeric types?



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentDictionaryCreator.java:
##########
@@ -163,6 +166,28 @@ public void build()
             numValues, sortedDoubles[0], sortedDoubles[numValues - 1]);
         return;
 
+      case BIG_DECIMAL:
+        BigDecimal[] sortedBigDecimals = (BigDecimal[]) _sortedValues;
+        numValues = sortedBigDecimals.length;
+
+        Preconditions.checkState(numValues > 0);
+        _bigDecimalValueToIndexMap = new Object2IntOpenHashMap<>(numValues);
+
+        for (int i = 0; i < numValues; i++) {
+          BigDecimal value = sortedBigDecimals[i];
+          _bigDecimalValueToIndexMap.put(value, i);
+          _numBytesPerEntry = Math.max(_numBytesPerEntry, 
BigDecimalUtils.byteSize(value));
+        }
+
+        assert _useVarLengthDictionary;

Review Comment:
   This might not always be true. It is possible that big decimal has all 
serialized bytes of the same size. I think we should be able to reuse 
`writeBytesValueDictionary()` here



##########
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/reader/Dictionary.java:
##########
@@ -132,6 +133,13 @@ default Object getInternal(int dictId) {
 
   double getDoubleValue(int dictId);
 
+  /**
+   * NOTE: Overridden for STRING and BYTES dictionary only for now.
+   */
+  default BigDecimal getBigDecimalValue(int dictId) {

Review Comment:
   Since we are adding it as native type, we should support it for all 
dictionaries



##########
pinot-spi/src/main/java/org/apache/pinot/spi/utils/JsonUtils.java:
##########
@@ -411,9 +419,10 @@ private static void unnestResults(List<Map<String, 
String>> currentResults,
   public static Schema getPinotSchemaFromJsonFile(File jsonFile,

Review Comment:
   When parsing schema, we should always allow big decimal (as default value). 
No need to add the extra boolean for this



##########
pinot-spi/src/main/java/org/apache/pinot/spi/utils/ArrayCopyUtils.java:
##########
@@ -52,6 +52,12 @@ public static void copy(int[] src, double[] dest, int 
length) {
     }
   }
 
+  public static void copy(int[] src, BigDecimal[] dest, int length) {
+    for (int i = 0; i < length; i++) {
+      dest[i] = new BigDecimal(src[i]);

Review Comment:
   Use `BigDecimal.valueOf()`



##########
pinot-core/src/main/java/org/apache/pinot/core/query/selection/SelectionOperatorService.java:
##########
@@ -121,7 +122,11 @@ private Comparator<Object[]> 
getTypeCompatibleComparator(List<OrderByExpressionC
         Object v2 = o2[index];
         int result;
         if (isNumber[i]) {
-          result = Double.compare(((Number) v1).doubleValue(), ((Number) 
v2).doubleValue());
+          if (columnDataTypes[i] == ColumnDataType.BIG_DECIMAL) {

Review Comment:
   We may avoid this extra if check per comparison by excluding `BIG_DECIMAL` 
from double comparison (change `isNumber` to `useDoubleComparison`)



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/readers/BigDecimalDictionary.java:
##########
@@ -0,0 +1,101 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.local.segment.index.readers;
+
+import java.math.BigDecimal;
+import org.apache.pinot.segment.spi.memory.PinotDataBuffer;
+import org.apache.pinot.spi.data.FieldSpec.DataType;
+import org.apache.pinot.spi.utils.BigDecimalUtils;
+
+
+/**
+ * Extension of {@link BaseImmutableDictionary} that implements immutable 
dictionary for BigDecimal type.
+ */
+public class BigDecimalDictionary extends BaseImmutableDictionary {
+
+  public BigDecimalDictionary(PinotDataBuffer dataBuffer, int length, int 
numBytesPerValue) {
+    // Works with VarLengthValueBuffer only.
+    super(dataBuffer, length, numBytesPerValue, (byte) 0);
+  }
+
+  @Override
+  public DataType getValueType() {
+    return DataType.BIG_DECIMAL;
+  }
+
+  @Override
+  public int insertionIndexOf(String stringValue) {
+    return binarySearch(new BigDecimal(stringValue));
+  }
+
+  @Override
+  public BigDecimal getMinVal() {
+    return BigDecimalUtils.deserialize(getBytes(0));
+  }
+
+  @Override
+  public BigDecimal getMaxVal() {
+    return BigDecimalUtils.deserialize(getBytes(length() - 1));
+  }
+
+  @Override
+  public BigDecimal get(int dictId) {
+    return BigDecimalUtils.deserialize(getBytes(dictId));
+  }
+
+  @Override
+  public Object getInternal(int dictId) {

Review Comment:
   Remove this override



##########
pinot-spi/src/main/java/org/apache/pinot/spi/utils/JsonUtils.java:
##########
@@ -72,6 +73,11 @@ private JsonUtils() {
   public static final ObjectWriter DEFAULT_WRITER = DEFAULT_MAPPER.writer();
   public static final ObjectWriter DEFAULT_PRETTY_WRITER = 
DEFAULT_MAPPER.writerWithDefaultPrettyPrinter();
 
+  private static final ObjectMapper DEFAULT_MAPPER_WITH_EXACT_BIG_DECIMAL = 
(new ObjectMapper())

Review Comment:
   (nit)
   ```suggestion
     private static final ObjectMapper DEFAULT_MAPPER_WITH_EXACT_BIG_DECIMAL = 
new ObjectMapper()
   ```



##########
pinot-core/src/main/java/org/apache/pinot/core/query/selection/SelectionOperatorUtils.java:
##########
@@ -367,6 +371,9 @@ public static Object[] extractRowFromDataTable(DataTable 
dataTable, int rowId) {
         case DOUBLE:
           row[i] = dataTable.getDouble(rowId, i);
           break;
+        case BIG_DECIMAL:
+          row[i] = dataTable.getObject(rowId, i);

Review Comment:
   Add a getter for big decimal



##########
pinot-core/src/main/java/org/apache/pinot/core/query/reduce/GroupByDataTableReducer.java:
##########
@@ -366,15 +366,16 @@ public void runJob() {
                     case DOUBLE:
                       values[colId] = dataTable.getDouble(rowId, colId);
                       break;
+                    case BIG_DECIMAL:

Review Comment:
   Should add an API for getting `BigDecimal`



##########
pinot-core/src/main/java/org/apache/pinot/core/operator/transform/function/LiteralTransformFunction.java:
##########
@@ -45,30 +46,32 @@ public class LiteralTransformFunction implements 
TransformFunction {
   private final long _longLiteral;
   private final float _floatLiteral;
   private final double _doubleLiteral;
+  private final BigDecimal _bigDecimalLiteral;
 
   // literals may be shared but values are intentionally not volatile as 
assignment races are benign
   private int[] _intResult;
   private long[] _longResult;
   private float[] _floatResult;
   private double[] _doubleResult;
+  private BigDecimal[] _bigDecimalResult;
   private String[] _stringResult;
   private byte[][] _bytesResult;
 
   public LiteralTransformFunction(String literal) {
     _literal = literal;
     _dataType = inferLiteralDataType(literal);
-    if (_dataType.isNumeric()) {
-      BigDecimal bigDecimal = new BigDecimal(_literal);
-      _intLiteral = bigDecimal.intValue();
-      _longLiteral = bigDecimal.longValue();
-      _floatLiteral = bigDecimal.floatValue();
-      _doubleLiteral = bigDecimal.doubleValue();
+    if (_dataType == DataType.TIMESTAMP) {
+      _bigDecimalLiteral = 
BigDecimal.valueOf(Timestamp.valueOf(literal).getTime());

Review Comment:
   I see. Let's add some comments explaining the behavior because it can cause 
confusion. Also, we should handle `BOOLEAN` as well (true should map to 1)



##########
pinot-core/src/main/java/org/apache/pinot/core/query/distinct/DistinctTable.java:
##########
@@ -305,6 +308,9 @@ public static DistinctTable fromByteBuffer(ByteBuffer 
byteBuffer)
           case STRING:
             values[j] = dataTable.getString(i, j);
             break;
+          case BIG_DECIMAL:
+            values[j] = dataTable.getObject(i, j);

Review Comment:
   Should add an API for getting `BigDecimal`



##########
pinot-tools/src/main/java/org/apache/pinot/tools/admin/command/JsonToPinotSchema.java:
##########
@@ -78,6 +78,10 @@ public class JsonToPinotSchema extends 
AbstractBaseAdminCommand implements Comma
       + "JSON string, can be NONE/NON_PRIMITIVE/ALL")
   String _collectionNotUnnestedToJson;
 
+  @CommandLine.Option(names = {"-parseExactBigDecimal"}, description = 
"Whether to parse exact BigDecimal or parse "
+      + "BigDecimal as doubles, default to false")
+  Boolean _parseExactBigDecimal = false;

Review Comment:
   No need to add this option. We should always parse big decimal as is



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #8503: Support single-valued BigDecimal in schema, type conversion, SQL statements and minimum set of transforms.

Reply via email to