davecromberge commented on code in PR #14355:
URL: https://github.com/apache/pinot/pull/14355#discussion_r1853649707


##########
pinot-plugins/pinot-minion-tasks/pinot-minion-builtin-tasks/src/main/java/org/apache/pinot/plugin/minion/tasks/mergerollup/DimensionValueTransformer.java:
##########
@@ -0,0 +1,74 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.plugin.minion.tasks.mergerollup;
+
+import java.util.HashMap;
+import java.util.Map;
+import java.util.Set;
+import org.apache.pinot.segment.local.recordtransformer.RecordTransformer;
+import org.apache.pinot.spi.data.FieldSpec;
+import org.apache.pinot.spi.data.Schema;
+import org.apache.pinot.spi.data.readers.GenericRow;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+
+/**
+ * The {@code DimensionValueTransformer} class will transform certain 
dimension values by substituting the
+ * existing value for that dimension with the 'defaultNullValue' from its 
'fieldSpec'.
+ */
+public class DimensionValueTransformer implements RecordTransformer {

Review Comment:
   @swaminathanmanish that is an interesting suggestion and would be a simpler 
solution.  However, it prompted me to think about at what point in the 
lifecycle of a segment we might want to erase dimension values.  Ideally, we 
would want to control what dimensions are eliminated for each time bucket.  For 
example, we might care about the "country_of_origin" dimension for less than a 
week but then default it for periods older than a week.  If we were to use the 
ingestion configuration transformations, my understanding is that we would just 
apply these regardless of the merge refresh time interval (aka segment 
generation).
   
   Of course, your suggestion could be used to introduce more complex 
transformations as well for bucket intervals, but, defaulting it to the 
column's default value is simplest.
   
   I'm going to try and adapt this PR to allow for more flexibility as to what 
dimensions are erased in which generation, giving the user more control. 
   
   Please let me know what you think, your input is appreciated.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to