[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #8622: Support BigDecimal raw value forward index; Support BigDecimal in many transforms and operators

GitBox Mon, 02 May 2022 11:47:33 -0700


Jackie-Jiang commented on code in PR #8622:
URL: https://github.com/apache/pinot/pull/8622#discussion_r863048176



##########
pinot-core/src/main/java/org/apache/pinot/core/operator/transform/function/SelectTupleElementTransformFunction.java:
##########
@@ -88,17 +90,24 @@ private static FieldSpec.DataType 
getLowestCommonDenominatorType(FieldSpec.DataT
     if (left == null || left == right) {
       return right;
     }
-    if ((right == FieldSpec.DataType.INT && left == FieldSpec.DataType.LONG)
-        || (left == FieldSpec.DataType.INT && right == 
FieldSpec.DataType.LONG)) {
-      return FieldSpec.DataType.LONG;
+    Set<FieldSpec.DataType> dataTypes = new HashSet<FieldSpec.DataType>() {{

Review Comment:
   Creating a map per method could be too much overhead. Suggest keeping the 
old way but just add `BIG_DECIMAL` into the picture
   ```suggestion
       if (left == BIG_DECIMAL || right == BIG_DECIMAL) {
         return BIG_DECIMAL;
       }
       ...
   ```



##########
pinot-core/src/main/java/org/apache/pinot/core/operator/transform/function/SingleParamMathTransformFunction.java:
##########
@@ -64,8 +77,24 @@ public double[] transformToDoubleValuesSV(ProjectionBlock 
projectionBlock) {
     return _doubleValuesSV;
   }
 
+  @Override
+  public BigDecimal[] transformToBigDecimalValuesSV(ProjectionBlock 
projectionBlock) {
+    int length = projectionBlock.getNumDocs();
+    if (_bigDecimalValuesSV == null || _bigDecimalValuesSV.length < length) {
+      _bigDecimalValuesSV = new BigDecimal[length];
+    }
+
+    BigDecimal[] values = 
_transformFunction.transformToBigDecimalValuesSV(projectionBlock);
+    applyMathOperator(values, projectionBlock.getNumDocs());
+    return _bigDecimalValuesSV;
+  }
+
   abstract protected void applyMathOperator(double[] values, int length);
 
+  protected void applyMathOperator(BigDecimal[] values, int length) {
+    throw new UnsupportedOperationException();

Review Comment:
   (minor) Put some message in the exception denoting the failure is caused by 
big-decimal



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/DefaultIndexCreatorProvider.java:
##########
@@ -207,6 +207,7 @@ public static ForwardIndexCreator 
getRawIndexCreatorForSVColumn(File file, Chunk
         return new SingleValueFixedByteRawIndexCreator(file, compressionType, 
column, totalDocs, dataType,
             writerVersion);
       case STRING:
+      case BIG_DECIMAL:

Review Comment:
   (nit) put it above `STRING`



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/readers/forward/VarByteChunkSVForwardIndexReader.java:
##########
@@ -18,18 +18,20 @@
  */
 package org.apache.pinot.segment.local.segment.index.readers.forward;
 
+import java.math.BigDecimal;
 import java.nio.ByteBuffer;
 import javax.annotation.Nullable;
 import 
org.apache.pinot.segment.local.io.writer.impl.VarByteChunkSVForwardIndexWriter;
 import org.apache.pinot.segment.spi.memory.PinotDataBuffer;
 import org.apache.pinot.spi.data.FieldSpec.DataType;
+import org.apache.pinot.spi.utils.BigDecimalUtils;
 
 import static java.nio.charset.StandardCharsets.UTF_8;
 
 
 /**
  * Chunk-based single-value raw (non-dictionary-encoded) forward index reader 
for values of variable length data type
- * (STRING, BYTES).
+ * (STRING, BIG_DECIMAL, BYTES).

Review Comment:
   (nit) Move this in front of `STRING`



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/io/writer/impl/VarByteChunkSVForwardIndexWriter.java:
##########
@@ -86,6 +88,12 @@ public VarByteChunkSVForwardIndexWriter(File file, 
ChunkCompressionType compress
     _chunkDataOffSet = _chunkHeaderSize;
   }
 
+  @Override
+  public void putBigDecimal(BigDecimal value) {
+    // Note: BigDecimal stores variable length data with very low 
length-variance withing a single column.

Review Comment:
   IMO this might not always be true



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentIndexCreationDriverImpl.java:
##########
@@ -374,6 +374,7 @@ public static void persistCreationMeta(File indexDir, long 
crc, long creationTim
   void buildIndexCreationInfo()
       throws Exception {
     Set<String> varLengthDictionaryColumns = new 
HashSet<>(_config.getVarLengthDictionaryColumns());
+    Set<String> rawIndexCreationColumns = _config.getRawIndexCreationColumns();

Review Comment:
   Good catch. We use 2 configs for no-dictionary column, so we might want to 
check `_config.getRawIndexCompressionType()` as well (the `createDictionary` 
flag is not checked though, but +1 on set it correctly)



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/fwd/SingleValueVarByteRawIndexCreator.java:
##########
@@ -33,7 +34,7 @@
 
 /**
  * Forward index creator for raw (non-dictionary-encoded) single-value column 
of variable length data type (STRING,
- * BYTES).
+ * BIG_DECIMAL, BYTES).

Review Comment:
   (nit) Move this in front of `STRING`



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/fwd/SingleValueFixedByteRawIndexCreator.java:
##########
@@ -30,7 +31,7 @@
 
 /**
  * Forward index creator for raw (non-dictionary-encoded) single-value column 
of fixed length data type (INT, LONG,
- * FLOAT, DOUBLE).
+ * FLOAT, DOUBLE, fixed-size BIG_DECIMAL).

Review Comment:
   Suggest not adding this support now. Conceptually `STRING` and `BYTES` can 
also be fixed-size, so we should add this support separately to cover all of 
them. Currently we don't track whether the values are fixed size



##########
pinot-core/src/main/java/org/apache/pinot/core/operator/transform/function/SubtractionTransformFunction.java:
##########
@@ -47,29 +53,41 @@ public void init(List<TransformFunction> arguments, 
Map<String, DataSource> data
       throw new IllegalArgumentException("Exactly 2 arguments are required for 
SUB transform function");
     }
 
-    TransformFunction firstArgument = arguments.get(0);
-    if (firstArgument instanceof LiteralTransformFunction) {
-      _firstLiteral = Double.parseDouble(((LiteralTransformFunction) 
firstArgument).getLiteral());
-    } else {
-      if (!firstArgument.getResultMetadata().isSingleValue()) {
-        throw new IllegalArgumentException("First argument of SUB transform 
function must be single-valued");
-      }
-      _firstTransformFunction = firstArgument;
-    }
-
-    TransformFunction secondArgument = arguments.get(1);
-    if (secondArgument instanceof LiteralTransformFunction) {
-      _secondLiteral = Double.parseDouble(((LiteralTransformFunction) 
secondArgument).getLiteral());
-    } else {
-      if (!secondArgument.getResultMetadata().isSingleValue()) {
-        throw new IllegalArgumentException("Second argument of SUB transform 
function must be single-valued");
+    _resultDataType = DataType.DOUBLE;
+    _doubleLiterals = new double[2];
+    _bigDecimalLiterals = new BigDecimal[2];
+    for (int i = 0; i < arguments.size(); i++) {
+      TransformFunction argument = arguments.get(i);
+      if (argument instanceof LiteralTransformFunction) {
+        LiteralTransformFunction literalTransformFunction = 
(LiteralTransformFunction) argument;
+        DataType dataType = 
literalTransformFunction.getResultMetadata().getDataType();
+        if (dataType == DataType.BIG_DECIMAL) {

Review Comment:
   See DIV



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/io/writer/impl/FixedByteChunkSVForwardIndexWriter.java:
##########
@@ -101,6 +103,12 @@ public void putDouble(double value) {
     flushChunkIfNeeded();
   }
 
+  public void putBigDecimal(BigDecimal value) {

Review Comment:
   We should not hit this method because big-decimal shouldn't be stored as 
fixed length



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/io/writer/impl/FixedByteChunkSVForwardIndexWriter.java:
##########
@@ -101,6 +103,12 @@ public void putDouble(double value) {
     flushChunkIfNeeded();
   }
 
+  public void putBigDecimal(BigDecimal value) {
+    _chunkBuffer.put(BigDecimalUtils.serialize(value));
+    _chunkDataOffset += BigDecimalUtils.byteSize(value);

Review Comment:
   Put the serialized bytes length here instead of re-calculating the byte size 
again



##########
pinot-core/src/main/java/org/apache/pinot/core/operator/transform/function/DivisionTransformFunction.java:
##########
@@ -46,30 +52,41 @@ public void init(List<TransformFunction> arguments, 
Map<String, DataSource> data
     if (arguments.size() != 2) {
       throw new IllegalArgumentException("Exactly 2 arguments are required for 
DIV transform function");
     }
-
-    TransformFunction firstArgument = arguments.get(0);
-    if (firstArgument instanceof LiteralTransformFunction) {
-      _firstLiteral = Double.parseDouble(((LiteralTransformFunction) 
firstArgument).getLiteral());
-    } else {
-      if (!firstArgument.getResultMetadata().isSingleValue()) {
-        throw new IllegalArgumentException("First argument of DIV transform 
function must be single-valued");
+    _resultDataType = DataType.DOUBLE;
+    _doubleLiterals = new double[2];
+    _bigDecimalLiterals = new BigDecimal[2];
+    for (int i = 0; i < arguments.size(); i++) {
+      TransformFunction argument = arguments.get(i);
+      if (argument instanceof LiteralTransformFunction) {
+        LiteralTransformFunction literalTransformFunction = 
(LiteralTransformFunction) argument;
+        DataType dataType = 
literalTransformFunction.getResultMetadata().getDataType();
+        if (dataType == DataType.BIG_DECIMAL) {

Review Comment:
   We also need to fill the big-decimal literal if the `_resultDataType` is 
already `BIG_DECIMAL` (first argument is a big-decimal column, second argument 
is a double literal, we might run into NPE). Same for `SUB`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #8622: Support BigDecimal raw value forward index; Support BigDecimal in many transforms and operators

Reply via email to