gortiz commented on code in PR #12223:
URL: https://github.com/apache/pinot/pull/12223#discussion_r1445851043


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/dictionary/DictionaryIndexType.java:
##########
@@ -91,6 +95,11 @@ public class DictionaryIndexType
   private static final Logger LOGGER = 
LoggerFactory.getLogger(DictionaryIndexType.class);
   private static final List<String> EXTENSIONS = 
Collections.singletonList(V1Constants.Dict.FILE_EXTENSION);
 
+  // Map containing columnName as key and the interner as value. The interner 
is common across all the segments.
+  // TODO: Check if tablename can be appended to the columnName.
+  private static Map<String, FALFInterner<String>> _strInternerInfoMap = new 
ConcurrentHashMap<>();
+  private static Map<String, FALFInterner<byte[]>> _byteInternerInfoMap = new 
ConcurrentHashMap<>();

Review Comment:
   I don't think this is the place to store the intern table. I can see several 
issues:
   1. By design, the IndexType should ideally be stateless. It represents the 
concept of an index. Adding state feels like breaking the separation of 
concerns.
   2. We should try to avoid having mutable state as static variables. I'm not 
sure how to do that right now, but it is not the first time we find this 
problem. We really need some server context to store server state.
      * Static mutable state open the door to multiple problems. The most 
trivial the side effects of running tests. Each test will be affected by the 
previous ones given these maps will be shared.
   4. The structure itself is problematic. Here we are indexing by 
`columnName`, ignoring the fact that two different tables may have columns with 
the same name. That is not a correctness issue, but seems strange we want to 
classify by columnName to improve the cache efficiency but at the same time 
ignoring the collisions between two different tables.
   
   As said, I don't have a solution right now to be able to implement this 
without static variables. But what I would suggest is to move this logic to its 
own singleton class. That at least would solve the separation of concerns issue 
and we could add a method to that singleton to clean the caches without making 
the index type more complex than it should.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to