gortiz commented on code in PR #12223: URL: https://github.com/apache/pinot/pull/12223#discussion_r1445851043
########## pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/dictionary/DictionaryIndexType.java: ########## @@ -91,6 +95,11 @@ public class DictionaryIndexType private static final Logger LOGGER = LoggerFactory.getLogger(DictionaryIndexType.class); private static final List<String> EXTENSIONS = Collections.singletonList(V1Constants.Dict.FILE_EXTENSION); + // Map containing columnName as key and the interner as value. The interner is common across all the segments. + // TODO: Check if tablename can be appended to the columnName. + private static Map<String, FALFInterner<String>> _strInternerInfoMap = new ConcurrentHashMap<>(); + private static Map<String, FALFInterner<byte[]>> _byteInternerInfoMap = new ConcurrentHashMap<>(); Review Comment: I don't think this is the place to store the intern table. I can see several issues: 1. By design, the IndexType should ideally be stateless. It represents the concept of an index. Adding state feels like breaking the separation of concerns. 2. We should try to avoid having mutable state as static variables. I'm not sure how to do that right now, but it is not the first time we find this problem. We really need some server context to store server state. * Static mutable state open the door to multiple problems. The most trivial the side effects of running tests. Each test will be affected by the previous ones given these maps will be shared. 4. The structure itself is problematic. Here we are indexing by `columnName`, ignoring the fact that two different tables may have columns with the same name. That is not a correctness issue, but seems strange we want to classify by columnName to improve the cache efficiency but at the same time ignoring the collisions between two different tables. As said, I don't have a solution right now to be able to implement this without static variables. But what I would suggest is to move this logic to its own singleton class. That at least would solve the separation of concerns issue and we could add a method to that singleton to clean the caches without making the index type more complex than it should. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org