siddharthteotia commented on a change in pull request #5667: URL: https://github.com/apache/incubator-pinot/pull/5667#discussion_r455326416
########## File path: pinot-core/src/main/java/org/apache/pinot/core/segment/index/loader/defaultcolumn/BaseDefaultColumnHandler.java ########## @@ -325,77 +323,92 @@ protected void removeColumnV1Indices(String column) } /** - * Right now the text index is supported on RAW (non-dictionary encoded) + * Right now the text index is supported on RAW and dictionary encoded * single-value STRING columns. Eventually we will relax the constraints * step by step. * For example, later on user should be able to create text index on - * a dictionary encoded STRING column that also has native Pinot's inverted - * index. We can also support it on BYTE columns later. + * a MV column * @param column column name * @param indexLoadingConfig index loading config * @param fieldSpec field spec */ private void checkUnsupportedOperationsForTextIndex(String column, IndexLoadingConfig indexLoadingConfig, FieldSpec fieldSpec) { - if (!indexLoadingConfig.getNoDictionaryColumns().contains(column)) { - throw new UnsupportedOperationException( - "Text index is currently not supported on dictionary encoded column: " + column); - } - - Set<String> sortedColumns = new HashSet<>(indexLoadingConfig.getSortedColumns()); - if (sortedColumns.contains(column)) { - // since Pinot's current implementation doesn't support raw sorted columns, - // we need to check for this too - throw new UnsupportedOperationException("Text index is currently not supported on sorted column: " + column); - } - if (!fieldSpec.isSingleValueField()) { throw new UnsupportedOperationException("Text index is currently not supported on multi-value column: " + column); } - if (fieldSpec.getDataType() != DataType.STRING) { throw new UnsupportedOperationException("Text index is currently only supported on STRING column:" + column); } } void createV1ForwardIndexForTextIndex(String column, IndexLoadingConfig indexLoadingConfig) Review comment: It seems to me that existing function createColumnV1Indices() has a bug which I want to investigate further and fix before I reuse that method for text index. Meanwhile, this PR is complete and I can refactor this portion next. Consider that case a new column gets added with inverted index. Two things happen: (1) First, default column handler creates dictionary and forward index. Note that it doesn't create bit encoded forward index. It creates sorted forward index. (2) Later during segment load, InvertedIndexHandler reads the forward index to create inverted index. This code when trying to create inverted index first gets a forward index reader. It gets the forward index buffer and creates the bit encoded forward index reader. This seems wrong to me since for newly added column, the forward index buffer had sorted index format and wasn't bit encoded. I will look into this in detail and as part of that cleanup the existing code first before reusing it for text. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org