somandal commented on code in PR #9333: URL: https://github.com/apache/pinot/pull/9333#discussion_r974859046
########## pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/loader/invertedindex/TextIndexHandler.java: ########## @@ -174,18 +175,46 @@ private void createTextIndexForColumn(SegmentDirectory.Writer segmentWriter, Col // segmentDirectory is indicated to us by SegmentDirectoryPaths, we create lucene index there. There is no // further need to move around the lucene index directory since it is created with correct directory structure // based on segmentVersion. - try (ForwardIndexReader forwardIndexReader = LoaderUtils.getForwardIndexReader(segmentWriter, columnMetadata); - ForwardIndexReaderContext readerContext = forwardIndexReader.createContext(); - TextIndexCreator textIndexCreator = indexCreatorProvider.newTextIndexCreator(IndexCreationContext.builder() - .withColumnMetadata(columnMetadata).withIndexDir(segmentDirectory).build().forTextIndex(_fstType, true))) { - if (columnMetadata.isSingleValue()) { - processSVField(segmentWriter, hasDictionary, forwardIndexReader, readerContext, textIndexCreator, numDocs, - columnMetadata); + try (TextIndexCreator textIndexCreator = indexCreatorProvider.newTextIndexCreator( + IndexCreationContext.builder().withColumnMetadata(columnMetadata).withIndexDir(segmentDirectory).build() + .forTextIndex(_fstType, true))) { + boolean forwardIndexDisabled = !segmentWriter.hasIndexFor(columnName, ColumnIndexType.FORWARD_INDEX); + if (forwardIndexDisabled) { + try (Dictionary dictionary = LoaderUtils.getDictionary(segmentWriter, columnMetadata)) { + // Create the text index if the dictionary length is 1 as this is for a default column (i.e. newly added + // column). For existing columns it is not possible to create the text index without forward index + Preconditions.checkState(dictionary.length() == 1, String.format("Creating text index for forward index " Review Comment: So we actually evaluated two approaches for this part: - Create the forward index, allow it to be marked as sorted. This will skip creation of the inverted index, and all other index handlers will kick in and create the index off of the forward index we create. (some index handlers rely on the forward index to construct the index) - Our main concern with this approach was that the segments will be inconsistent, some will have a forward index and others won't. This may give odd behavior at query time depending on the segment touched (e.g. some queries go fine with a given filter but don't go through with a slightly different filter). We wanted to avoid this. - We had also explored an option to identify such queries on the Broker or Server planning stage but quickly found that this became very ugly and had lots of exceptions and was error prone. - Don't create the forward index but do create the dictionary (just like the normal segment creation path). Modify the index handlers to create the index for such a default column if forward index is disabled without relying on the the forward index. In such cases we expect exactly 1 value anyways. This will give a consistent experience to users who run queries. - This has the disadvantage of doing some special handling in the index handlers for this scenario since the forward index won't be available anymore. Hope the above explains our thinking about the approaches and why we decided to go with the second one. Let me know if you'd like to discuss this in more detail. cc @siddharthteotia -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org