mcvsubbu commented on a change in pull request #6255: URL: https://github.com/apache/incubator-pinot/pull/6255#discussion_r530594713
########## File path: pinot-common/src/main/java/org/apache/pinot/common/config/tuner/RealTimeAutoIndexTuner.java ########## @@ -0,0 +1,42 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.pinot.common.config.tuner; + +import org.apache.pinot.spi.config.table.IndexingConfig; +import org.apache.pinot.spi.config.table.TableConfig; +import org.apache.pinot.spi.config.table.tuner.TableConfigTuner; +import org.apache.pinot.spi.data.Schema; + + +/** + * Used to auto-tune the table indexing config. It takes the original table + * config, table schema and adds the following to indexing config: + * - Inverted indices for all dimensions + * - No dictionary index for all metrics + */ +public class RealTimeAutoIndexTuner { + + @TableConfigTuner(name = "realtimeAutoIndexTuner") + public static TableConfig tuneTableConfig(TableConfig initialConfig, Schema schema) { Review comment: Pinot supports a really wide variety of use cases. Even for a single use case, there are multiple questions to be answered before we can arrive at the right parameters to use. This being the case, I don't think we can automatically tune a use case just given schema and table config, except for the very basic of the basic use cases (where performance probably does not matter). We need to have some sample data (or maybe a `DataDescriptor` class that may point to sample data, or in other way describe the data in some way -- I have even thought about including such a hint in the schema, for example -- with fields such as expected cardinality, average length of string columns, etc.),and also sample queries. With this we can arrive at some automated code (as Jia has done) to come up with recommendations. The flag of applying those recommendations can be optional. In some cases, we can directly apply the recommendations. In others, we can let the admins decide. Similarly, for realtime, we need to come up with an optimal segment size for the use case. The problem of multi-tenant is yet another dimension, where the tuning we do for one table can be potentially bad for another one. One thing we _can_ do is to add a periodic task that tunes and rebalances tables, etc. The interface in that case is meaningless because the task has access to pretty much everything -- past segments, past queries, cardinality of columns, whatever. We could also add this as a minion task. There is no need of an interface in that case, other than a cluster-wide setting that says table must be auto-tuned, and maybe a flag within a table asking not to touch this one. I think the interface you are trying to define, and the problem you are trying to solve, is that of initial provisioning and not continuous tuning. Without having visibility into the problem you are trying to solve, it gets a bit hard to review an interface. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org