mcvsubbu commented on a change in pull request #6255:
URL: https://github.com/apache/incubator-pinot/pull/6255#discussion_r530594713



##########
File path: 
pinot-common/src/main/java/org/apache/pinot/common/config/tuner/RealTimeAutoIndexTuner.java
##########
@@ -0,0 +1,42 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.common.config.tuner;
+
+import org.apache.pinot.spi.config.table.IndexingConfig;
+import org.apache.pinot.spi.config.table.TableConfig;
+import org.apache.pinot.spi.config.table.tuner.TableConfigTuner;
+import org.apache.pinot.spi.data.Schema;
+
+
+/**
+ * Used to auto-tune the table indexing config. It takes the original table
+ * config, table schema and adds the following to indexing config:
+ * - Inverted indices for all dimensions
+ * - No dictionary index for all metrics
+ */
+public class RealTimeAutoIndexTuner {
+
+  @TableConfigTuner(name = "realtimeAutoIndexTuner")
+  public static TableConfig tuneTableConfig(TableConfig initialConfig, Schema 
schema) {

Review comment:
       Pinot supports a really wide variety of use cases. Even for a single use 
case, there are multiple questions to be answered before we can arrive at the 
right parameters to use. This being the case, I don't think we can 
automatically tune a use case just given schema and table config, except for 
the very basic of the basic use cases (where performance probably does not 
matter).
   
   We need to have some sample data (or maybe a `DataDescriptor` class that may 
point to sample data, or in other way describe the data in some way -- I have 
even thought about including such a hint in the schema, for example -- with 
fields such as expected cardinality, average length of string columns, 
etc.),and also sample queries.
   
   With this we can arrive at some automated code (as Jia has done) to come up 
with recommendations. The flag of applying those recommendations can be 
optional. In some cases, we can directly apply the recommendations. In others, 
we can let the admins decide.
   
   Similarly, for realtime, we need to come up with an optimal segment size for 
the use case.
   
   The problem of multi-tenant is yet another dimension, where the tuning we do 
for one table can be potentially bad for another one. 
   
   One thing we _can_ do is to add a periodic task that tunes and rebalances 
tables, etc. The interface in that case is meaningless because the task has 
access to pretty much everything -- past segments, past queries, cardinality of 
columns, whatever. We could also add this as a minion task. There is no need of 
an interface in that case, other than a cluster-wide setting that says table 
must be auto-tuned, and maybe a flag within a table asking not to touch this 
one.
   
   I think the interface you are trying to define, and the problem you are 
trying to solve, is that of initial provisioning and not continuous tuning. 
   
   Without having visibility into the problem you are trying to solve, it gets 
a bit hard to review an interface.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to