Jackie-Jiang commented on code in PR #9333: URL: https://github.com/apache/pinot/pull/9333#discussion_r976058172
########## pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/loader/invertedindex/TextIndexHandler.java: ########## @@ -174,18 +175,46 @@ private void createTextIndexForColumn(SegmentDirectory.Writer segmentWriter, Col // segmentDirectory is indicated to us by SegmentDirectoryPaths, we create lucene index there. There is no // further need to move around the lucene index directory since it is created with correct directory structure // based on segmentVersion. - try (ForwardIndexReader forwardIndexReader = LoaderUtils.getForwardIndexReader(segmentWriter, columnMetadata); - ForwardIndexReaderContext readerContext = forwardIndexReader.createContext(); - TextIndexCreator textIndexCreator = indexCreatorProvider.newTextIndexCreator(IndexCreationContext.builder() - .withColumnMetadata(columnMetadata).withIndexDir(segmentDirectory).build().forTextIndex(_fstType, true))) { - if (columnMetadata.isSingleValue()) { - processSVField(segmentWriter, hasDictionary, forwardIndexReader, readerContext, textIndexCreator, numDocs, - columnMetadata); + try (TextIndexCreator textIndexCreator = indexCreatorProvider.newTextIndexCreator( + IndexCreationContext.builder().withColumnMetadata(columnMetadata).withIndexDir(segmentDirectory).build() + .forTextIndex(_fstType, true))) { + boolean forwardIndexDisabled = !segmentWriter.hasIndexFor(columnName, ColumnIndexType.FORWARD_INDEX); + if (forwardIndexDisabled) { + try (Dictionary dictionary = LoaderUtils.getDictionary(segmentWriter, columnMetadata)) { + // Create the text index if the dictionary length is 1 as this is for a default column (i.e. newly added + // column). For existing columns it is not possible to create the text index without forward index + Preconditions.checkState(dictionary.length() == 1, String.format("Creating text index for forward index " Review Comment: I don't follow the second approach. For the default column where all the values are the same for a column, we should create sorted index and a single entry dictionary. This is the same behavior as generating a new segment with a column of all the same values. IMO second approach will actually cause inconsistent behavior because it will have inverted index with only a single entry, and the segment generated during the reload or from the raw data without a column will be different. I strongly suggest generating sorted index + dictionary for the default column. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org