[GitHub] [pinot] gortiz commented on pull request #10191: [Index SPI] IndexType

via GitHub Wed, 01 Mar 2023 02:22:03 -0800


gortiz commented on PR #10191:
URL: https://github.com/apache/pinot/pull/10191#issuecomment-1449801023


   > But in ColumnIndexType, the _indexName is always the lower case of the 
enum value. Can we always use IndexName? That can avoid a lot of confusion. If 
you really think we need 2 different names, we can call the internal one 
getInternalName()
   
   I don't like having the two concepts either, but I think we need them for 
compatibility and usability reasons.
   
   Yes, the ids of the literals in `ColumnIndexType` are in upper case and the 
names are the same value in lower case. But the literals are the following:
   
   ```
   DICTIONARY("dictionary"),
     FORWARD_INDEX("forward_index"),
     INVERTED_INDEX("inverted_index"),
     BLOOM_FILTER("bloom_filter"),
     NULLVALUE_VECTOR("nullvalue_vector"),
     TEXT_INDEX("text_index"),
     FST_INDEX("fst_index"),
     JSON_INDEX("json_index"),
     RANGE_INDEX("range_index"),
     H3_INDEX("h3_index");
   ```
   
   As you can see, all but dictionary and null value have the `_index` suffix.
   
   In `index-spi` we have a new concept of _id_ which is the one used to 
describe the index to the user. We use this _id_ to indicate that something is 
wrong with an index and the user uses that _id_ to define the config for that 
specific index. This new concept of _id_ is different than the older one. We 
don't want the user to write the `_index` suffix everywhere like the following:
   
   ```js
   {
     "fieldConfigList": [
       {
          "name": “col1”,
          "indexes": {
            "inverted_index": {
              "enabled": true
            },
            "range_index": {
              "enabled": true
            },
            "text_index": {
              "type": “lucene”,
              "enableQueryCacheForTextIndex": false,
              "stopWordsInclude": ["a", "b", "c"]
            },
            "bloom_index": {
              "fpp": 0.01,
              "maxSizeInBytes": 1000000,
              "loadOnHeap": true
             }
          }
       }
     ]
   }
   ```
   
   And I think printing errors like `"there was a problem with index of type " 
+ indexType` would be quite repetitive if they print `"there was a problem with 
index of type bloom_index"`.
   
   At the same time we cannot decide to just remove the `_index` suffix because 
that literal is used in the segment metadata, so we need to read segments 
commited with `_index` suffix. And at the same time the suffix is not always 
there (dictionary and null vector do not add that suffix) so we cannot just 
create the version with suffix by appending `_index` to the new user-friendly 
concept of id.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

[GitHub] [pinot] gortiz commented on pull request #10191: [Index SPI] IndexType

Reply via email to