[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #8917: Add support for querying noDict MV columns for offline (all data types) and realtime (fixed width) segments

GitBox Tue, 21 Jun 2022 16:46:36 -0700


Jackie-Jiang commented on code in PR #8917:
URL: https://github.com/apache/pinot/pull/8917#discussion_r903149184



##########
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/reader/ForwardIndexReader.java:
##########
@@ -410,9 +411,318 @@ default byte[] getBytes(int docId, T context) {
 
   /**
    * MULTI-VALUE COLUMN RAW INDEX APIs
-   * TODO: Not supported yet
    */
 
+  /**
+   * Fills the values
+   * @param docIds Array containing the document ids to read
+   * @param length Number of values to read
+   * @param maxNumValuesPerMVEntry maximum number of values per MV entry
+   * @param values Values to fill
+   * @param context Reader context
+   */
+  default void readValuesMV(int[] docIds, int length, int 
maxNumValuesPerMVEntry, int[][] values, T context) {
+    switch (getStoredType()) {
+      case INT:
+        int[] intValueBuffer = new int[maxNumValuesPerMVEntry];
+        for (int i = 0; i < length; i++) {
+          int numValues = getIntMV(docIds[i], intValueBuffer, context);

Review Comment:
   For MV read, let's add APIs to directly read the values without passing in a 
buffer, e.g. `int[] getIntMV(int docId, T context)`. This API will be very 
useful to prevent unnecessary copying of the array, or when the 
`maxNumValuesPerMVEntry` is not available



##########
pinot-core/src/main/java/org/apache/pinot/core/common/DataFetcher.java:
##########
@@ -713,8 +730,51 @@ void readStringValuesMV(TransformEvaluator evaluator, 
int[] docIds, int length,
 
     public void readNumValuesMV(int[] docIds, int length, int[] 
numValuesBuffer) {
       Tracing.activeRecording().setInputDataType(_dataType, _singleValue);
-      for (int i = 0; i < length; i++) {
-        numValuesBuffer[i] = _reader.getDictIdMV(docIds[i], 
_reusableMVDictIds, getReaderContext());
+      if (_dictionary != null) {
+        for (int i = 0; i < length; i++) {
+          numValuesBuffer[i] = _reader.getDictIdMV(docIds[i], 
_reusableMVDictIds, getReaderContext());
+        }
+      } else {
+        switch (_reader.getStoredType()) {
+          case INT:
+            int[] intValueBuffer = new int[_maxNumValuesPerMVEntry];
+            for (int i = 0; i < length; i++) {
+              numValuesBuffer[i] = _reader.getIntMV(docIds[i], intValueBuffer, 
getReaderContext());

Review Comment:
   This is adding lots of overhead because we don't really need to read the 
values. Let's add an API `int getNumValuesMV(int docId, T context)` to the 
`ForwardIndexReader` which simply returns the values in the MV entry without 
reading any content



##########
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/reader/ForwardIndexReader.java:
##########
@@ -47,7 +48,7 @@
    * Returns the data type of the values in the forward index. Returns {@link 
DataType#INT} for dictionary-encoded
    * forward index.
    */
-  DataType getValueType();
+  DataType getStoredType();

Review Comment:
   Can we put this change as a separate PR?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #8917: Add support for querying noDict MV columns for offline (all data types) and realtime (fixed width) segments

Reply via email to