jackjlli commented on a change in pull request #6361: URL: https://github.com/apache/incubator-pinot/pull/6361#discussion_r568138689
########## File path: pinot-server/src/main/java/org/apache/pinot/server/starter/helix/HelixInstanceDataManager.java ########## @@ -326,6 +339,11 @@ public SegmentMetadata getSegmentMetadata(String tableNameWithType, String segme } } + @Override + public Set<String> getColumnNamesByTable(String tableNameWithType) { + return _tableColumnNamesMap.computeIfAbsent(tableNameWithType, k -> Collections.emptySet()); Review comment: I don't quite fully understand the meaning of testing the overhead of computing the map once per query. The table columns are fetched from the schema which is stored in Zookeeper, and they are used to detect the column name mismatch during query execution. If schema fetch is done once per query, the ZK access would be a bottleneck for query execution. Plus, it'd be good to stick to one behavior in all different circumstances. Having a cache for high qps usecase fabric while directly fetching once per query for large MT cluster would make the code more difficult to maintain. `_tableColumnNamesMap` would stay the same state with `_tableDataManagerMap`. This is to make sure that the behavior of these two object are always consistent. And I don't see `_tableDataManagerMap` drops any reference of dataManager, until the server is shutting down. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org