xiangfu0 commented on code in PR #11993:
URL: https://github.com/apache/pinot/pull/11993#discussion_r1406738352


##########
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/compression/DictCompressionType.java:
##########
@@ -0,0 +1,23 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.spi.compression;
+
+public enum DictCompressionType {

Review Comment:
   Do you foresee there will be single value dict compression type?
   If so, maybe we can use one bit or a few bits to represent some feature can 
has builtin method like `isMvDictionary()`.
   
   



##########
pinot-spi/src/main/java/org/apache/pinot/spi/config/table/FieldConfig.java:
##########
@@ -119,7 +119,19 @@ public enum IndexType {
   }
 
   public enum CompressionCodec {

Review Comment:
   Shall we separate since raw and dictionary compression has no overlap? 



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/loader/ForwardIndexHandler.java:
##########
@@ -381,37 +368,64 @@ private boolean shouldDisableDictionary(String column, 
ColumnMetadata existingCo
     return true;
   }
 
-  private boolean shouldChangeCompressionType(String column, 
SegmentDirectory.Reader segmentReader)
+  private boolean shouldChangeRawCompressionType(String column, 
SegmentDirectory.Reader segmentReader)
       throws Exception {
-    ColumnMetadata existingColMetadata = 
_segmentDirectory.getSegmentMetadata().getColumnMetadataFor(column);
-
     // The compression type for an existing segment can only be determined by 
reading the forward index header.
+    ColumnMetadata existingColMetadata = 
_segmentDirectory.getSegmentMetadata().getColumnMetadataFor(column);
+    ChunkCompressionType existingCompressionType;
     try (ForwardIndexReader<?> fwdIndexReader = 
ForwardIndexType.read(segmentReader, existingColMetadata)) {
-      ChunkCompressionType existingCompressionType = 
fwdIndexReader.getCompressionType();
+      existingCompressionType = fwdIndexReader.getCompressionType();
       Preconditions.checkState(existingCompressionType != null,
           "Existing compressionType cannot be null for raw forward index 
column=" + column);
+    }
 
-      // Get the new compression type.
-      ChunkCompressionType newCompressionType = 
_fieldIndexConfigs.get(column).getConfig(StandardIndexes.forward())
-          .getChunkCompressionType();
+    // Get the new compression type.
+    ChunkCompressionType newCompressionType =
+        
_fieldIndexConfigs.get(column).getConfig(StandardIndexes.forward()).getChunkCompressionType();
 
-      // Note that default compression type (PASS_THROUGH for metric and LZ4 
for dimension) is not considered if the
-      // compressionType is not explicitly provided in tableConfig. This is to 
avoid incorrectly rewriting all the
-      // forward indexes during segmentReload when the default compressionType 
changes.
-      return newCompressionType != null && existingCompressionType != 
newCompressionType;
+    // Note that default compression type (PASS_THROUGH for metric and LZ4 for 
dimension) is not considered if the
+    // compressionType is not explicitly provided in tableConfig. This is to 
avoid incorrectly rewriting all the
+    // forward indexes during segmentReload when the default compressionType 
changes.
+    return newCompressionType != null && existingCompressionType != 
newCompressionType;
+  }
+
+  private boolean shouldChangeDictCompressionType(String column, 
SegmentDirectory.Reader segmentReader)
+      throws Exception {
+    // The compression type for an existing segment can only be determined by 
reading the forward index header.
+    ColumnMetadata existingColMetadata = 
_segmentDirectory.getSegmentMetadata().getColumnMetadataFor(column);
+    DictCompressionType existingCompressionType;
+    try (ForwardIndexReader<?> fwdIndexReader = 
ForwardIndexType.read(segmentReader, existingColMetadata)) {
+      existingCompressionType = fwdIndexReader.getDictCompressionType();
     }
+
+    // Get the new compression type.
+    DictCompressionType newCompressionType =
+        
_fieldIndexConfigs.get(column).getConfig(StandardIndexes.forward()).getDictCompressionType();
+    // MV_ENTRY_DICT can only be applied to multi-value columns.
+    if (newCompressionType == DictCompressionType.MV_ENTRY_DICT && 
existingColMetadata.isSingleValue()) {

Review Comment:
   This if check will be hard to maintain, shall we consider having a static 
method in `DictCompressionType` like 
`isMultiValueDictionary(DictCompressionType type)`?



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/readers/forward/FixedBitMVEntryDictForwardIndexReader.java:
##########
@@ -0,0 +1,155 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.local.segment.index.readers.forward;
+
+import com.google.common.base.Preconditions;
+import java.util.List;
+import org.apache.pinot.segment.local.io.util.FixedBitIntReaderWriter;
+import org.apache.pinot.segment.local.io.util.PinotDataBitSet;
+import 
org.apache.pinot.segment.local.io.writer.impl.FixedBitMVEntryDictForwardIndexWriter;
+import org.apache.pinot.segment.spi.compression.DictCompressionType;
+import org.apache.pinot.segment.spi.index.reader.ForwardIndexReader;
+import org.apache.pinot.segment.spi.index.reader.ForwardIndexReaderContext;
+import org.apache.pinot.segment.spi.memory.PinotDataBuffer;
+import org.apache.pinot.spi.data.FieldSpec.DataType;
+
+
+/**
+ * Bit-compressed dictionary-encoded forward index reader for multi-value 
columns, where a second level dictionary
+ * encoding for multi-value entries (instead of individual values within the 
entry) are maintained within the forward
+ * index.
+ * See {@link FixedBitMVEntryDictForwardIndexWriter} for index layout.
+ */
+public final class FixedBitMVEntryDictForwardIndexReader implements 
ForwardIndexReader<ForwardIndexReaderContext> {
+  public static final int MAGIC_MARKER = 
FixedBitMVEntryDictForwardIndexWriter.MAGIC_MARKER;
+  public static final short VERSION = 
FixedBitMVEntryDictForwardIndexWriter.VERSION;
+  private static final int HEADER_SIZE = 24;

Review Comment:
   This can also use FixedBitMVEntryDictForwardIndexWriter.HEADER_SIZE?



##########
pinot-segment-local/src/test/java/org/apache/pinot/segment/local/segment/index/forward/FixedBitMVEntryDictForwardIndexTest.java:
##########
@@ -0,0 +1,98 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.local.segment.index.forward;
+
+import java.io.File;
+import java.io.IOException;
+import java.util.Random;
+import org.apache.commons.io.FileUtils;
+import 
org.apache.pinot.segment.local.io.writer.impl.FixedBitMVEntryDictForwardIndexWriter;
+import 
org.apache.pinot.segment.local.segment.index.readers.forward.FixedBitMVEntryDictForwardIndexReader;
+import org.apache.pinot.segment.spi.V1Constants;
+import org.apache.pinot.segment.spi.memory.PinotDataBuffer;
+import org.testng.annotations.AfterClass;
+import org.testng.annotations.BeforeClass;
+import org.testng.annotations.Test;
+
+import static org.testng.Assert.assertEquals;
+
+
+public class FixedBitMVEntryDictForwardIndexTest {

Review Comment:
   Can you add a test for all null value or all empty array ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to