[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #5625: Make DataSource independent of query execution (not extend Operator)

GitBox Wed, 01 Jul 2020 17:19:08 -0700


siddharthteotia commented on a change in pull request #5625:
URL: https://github.com/apache/incubator-pinot/pull/5625#discussion_r448680574




##########
File path: 
pinot-core/src/main/java/org/apache/pinot/core/common/ColumnValueReader.java
##########
@@ -0,0 +1,248 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.common;
+
+import org.apache.pinot.core.io.reader.ForwardIndexReader;
+import org.apache.pinot.core.io.reader.ReaderContext;
+import org.apache.pinot.spi.data.FieldSpec.DataType;
+
+
+/**
+ * ColumnValueReader is a wrapper class on top of the forward index reader to 
read values from it.

Review comment:
       Why do we need this facade over ForwardIndexReader?
   
   Also, this is misleading since ColumnValue implies the actual value in the 
column whereas if it is passing the call to ForwardIndex, then for dictionary 
encoded columns you are essentially reading the dictionaryIds which will then 
be used by DataFetcher to fetch actual values from the dictionary. 
   
   DataFetcher takes care of reading the forward index and if there is 
dictionary, it then reads the dictionary. So DataFetcher takes care of reading 
the actual column values in both cases. I don't think we need this. 
   
   DataFetcher should directly use ForwardIndexReader and Dictionary (which it 
already does)
   
   Having this class also adds to the unclear semantics -- should 
ForwardIndexReader be enough or should we wrap it around ColumnValueReader? For 
example, we are using ColumnValueReader in PinotSegmentColumnReader but 
directly using ForwardIndexReader in IndexSegmentUtils

##########
File path: 
pinot-core/src/main/java/org/apache/pinot/core/common/DataFetcher.java
##########
@@ -21,28 +21,24 @@
 import java.util.Arrays;
 import java.util.HashMap;
 import java.util.Map;
-import org.apache.pinot.core.operator.docvalsets.MultiValueSet;
-import org.apache.pinot.core.operator.docvalsets.SingleValueSet;
 import org.apache.pinot.core.plan.DocIdSetPlanNode;
 import org.apache.pinot.core.segment.index.readers.Dictionary;
 
 
 /**
- * DataFetcher is a higher level abstraction for data fetching. Given an index 
segment, DataFetcher can manage the
- * DataSource, Dictionary, BlockValSet and BlockValIterator for this segment, 
preventing redundant construction for
- * these instances. DataFetcher can be used by both selection, aggregation and 
group-by data fetching process, reducing
- * duplicate codes and garbage collection.
+ * DataFetcher is a higher level abstraction for data fetching. Given the 
DataSource, DataFetcher can manage the
+ * ColumnValueReader and Dictionary for the column, preventing redundant 
construction for these instances. DataFetcher

Review comment:
       IMO, we should directly use the ForwardIndexReader here

##########
File path: 
pinot-core/src/main/java/org/apache/pinot/core/io/reader/impl/BaseChunkSVForwardIndexReader.java
##########
@@ -16,27 +16,24 @@
  * specific language governing permissions and limitations
  * under the License.
  */
-package org.apache.pinot.core.io.reader.impl.v1;
+package org.apache.pinot.core.io.reader.impl;
 
 import java.io.IOException;
 import java.nio.ByteBuffer;
 import org.apache.pinot.core.io.compression.ChunkCompressorFactory;
 import org.apache.pinot.core.io.compression.ChunkDecompressor;
-import org.apache.pinot.core.io.reader.BaseSingleColumnSingleValueReader;
-import org.apache.pinot.core.io.reader.impl.ChunkReaderContext;
-import org.apache.pinot.core.io.writer.impl.v1.BaseChunkSingleValueWriter;
+import org.apache.pinot.core.io.reader.ForwardIndexReader;
+import org.apache.pinot.core.io.writer.impl.BaseChunkSVForwardIndexWriter;
 import org.apache.pinot.core.segment.memory.PinotDataBuffer;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
 
 /**
- * Abstract class implementation for {@link BaseSingleColumnSingleValueReader}.
- * Base class for the fixed and variable byte reader implementations.
- *
+ * Base implementation for chunk based single-value forward index reader.

Review comment:
       Consider changing to - `Base implementation for chunk based single-value 
raw forward index reader.`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #5625: Make DataSource independent of query execution (not extend Operator)

Reply via email to