Jackie-Jiang commented on a change in pull request #6046:
URL: https://github.com/apache/incubator-pinot/pull/6046#discussion_r493140341



##########
File path: 
pinot-spi/src/main/java/org/apache/pinot/spi/data/readers/RecordExtractor.java
##########
@@ -27,8 +28,9 @@
  * 2) Collections become Object[] i.e. multi-value column
  * 3) Nested/Complex fields (e.g. json maps, avro maps, avro records) become 
Map<Object, Object>
  * @param <T> The format of the input record
+ * @param <V> The input record's field to be converted

Review comment:
       Recommend not adding this generic type `V` as in most cases it is 
`Object` (the field can be of lots of types for the same file format)

##########
File path: 
pinot-spi/src/main/java/org/apache/pinot/spi/data/readers/RecordExtractor.java
##########
@@ -27,8 +28,9 @@
  * 2) Collections become Object[] i.e. multi-value column
  * 3) Nested/Complex fields (e.g. json maps, avro maps, avro records) become 
Map<Object, Object>
  * @param <T> The format of the input record
+ * @param <V> The input record's field to be converted
  */
-public interface RecordExtractor<T> {
+public interface RecordExtractor<T, V> {
 
   /**
    * Initialize the record extractor with its config

Review comment:
       Put nullable annotation before `fields`

##########
File path: 
pinot-spi/src/main/java/org/apache/pinot/spi/data/readers/AbstractDefaultRecordExtractor.java
##########
@@ -0,0 +1,166 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.spi.data.readers;
+
+import java.nio.ByteBuffer;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.Map;
+import javax.annotation.Nullable;
+
+
+/**
+ * Abstract class for extracting and converting the fields of various data 
formats into supported Pinot data types.
+ *
+ * @param <T> the format of the input record
+ * @param <V> value used for converting the nested/complex fields of the file 
format (e.g. GenericRecord for Avro).
+ *            In most cases, this will be the same type as {@code T}.
+ */
+public abstract class AbstractDefaultRecordExtractor<T, V> implements 
RecordExtractor<T, Object> {

Review comment:
       Suggest renaming it to `BaseRecordExtractor`

##########
File path: 
pinot-spi/src/main/java/org/apache/pinot/spi/data/readers/AbstractDefaultRecordExtractor.java
##########
@@ -0,0 +1,166 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.spi.data.readers;
+
+import java.nio.ByteBuffer;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.Map;
+import javax.annotation.Nullable;
+
+
+/**
+ * Abstract class for extracting and converting the fields of various data 
formats into supported Pinot data types.
+ *
+ * @param <T> the format of the input record
+ * @param <V> value used for converting the nested/complex fields of the file 
format (e.g. GenericRecord for Avro).
+ *            In most cases, this will be the same type as {@code T}.
+ */
+public abstract class AbstractDefaultRecordExtractor<T, V> implements 
RecordExtractor<T, Object> {
+
+  /**
+   * Converts the field value to either a single value (string, number, 
bytebuffer), multi value (Object[]) or a Map.
+   * Returns {@code null} if the field value is {@code null}.
+   *
+   * Natively Pinot only understands single values and multi values.
+   * Map is useful only if some ingestion transform functions operates on it 
in the transformation layer.
+   */
+  @Nullable
+  public Object convert(@Nullable Object value) {
+    if (value == null) {
+      return null;
+    }
+    Object convertedValue;
+    if (isInstanceOfMultiValue(value)) {
+      convertedValue = convertMultiValue(value);
+    } else if (isInstanceOfMap(value)) {
+      convertedValue = convertMap(value);
+    } else if (isInstanceOfRecord(value)) {
+      convertedValue = convertRecord((V) value);
+    } else {
+      convertedValue = convertSingleValue(value);
+    }
+    return convertedValue;
+  }
+
+  /**
+   * Returns whether the object is an instance of the data format's base type.
+   */
+  protected abstract boolean isInstanceOfRecord(Object value);
+
+  /**
+   * Returns whether the object is of a multi-value type. Override this method 
if the data format represents
+   * multi-value objects differently.
+   */
+  protected boolean isInstanceOfMultiValue(Object value) {
+    return value instanceof Collection;
+  }
+
+  /**
+   * Returns whether the object is of a map type. Override this method if the 
data format represents map objects
+   * differently.
+   */
+  protected boolean isInstanceOfMap(Object value) {
+    return value instanceof Map;
+  }
+
+  /**
+   * Handles the conversion of every field of the object for the particular 
data format.
+   */
+  @Nullable
+  protected abstract Object convertRecord(V value);
+
+  /**
+   * Handles the conversion of each element of a multi-value object. Returns 
{@code null} if the field value is
+   * {@code null}.
+   *
+   * This implementation converts the Collection to an Object array. Override 
this method if the data format
+   * requires a different conversion for its multi-value objects.
+   */
+  @Nullable
+  protected Object convertMultiValue(Object value) {
+    Collection collection = (Collection) value;
+    if (collection.isEmpty()) {
+      return null;
+    }
+
+    int numValues = collection.size();
+    Object[] array = new Object[numValues];
+    int index = 0;
+    for (Object element : collection) {
+      Object convertedValue = convert(element);
+      if (convertedValue != null && !convertedValue.toString().equals("")) {
+        array[index++] = convertedValue;
+      }
+    }
+
+    if (index == numValues) {
+      return array;
+    } else if (index == 0) {
+      return null;
+    } else {
+      return Arrays.copyOf(array, index);
+    }
+  }
+
+  /**
+   * Handles the conversion of every value of the map. Note that map keys will 
be handled as a single-value type.
+   * Returns {@code null} if the field value is {@code null}. This should be 
overridden if the data format requires
+   * a different conversion for map values.
+   */
+  @Nullable
+  protected Object convertMap(Object value) {
+    Map map = (Map) value;
+    if (map.isEmpty()) {
+      return null;
+    }
+
+    Map<Object, Object> convertedMap = new HashMap<>();
+    for (Object key : map.keySet()) {
+      convertedMap.put(convertSingleValue(key), convert(map.get(key)));
+    }
+    return convertedMap;
+  }
+
+  /**
+   * Converts single value types. This should be overridden if the data format 
requires
+   * a different conversion for its single values. Returns {@code null} for 
{@code null} input values.
+   */
+  @Nullable
+  protected Object convertSingleValue(@Nullable Object value) {

Review comment:
       The argument will never be null

##########
File path: 
pinot-spi/src/main/java/org/apache/pinot/spi/data/readers/RecordExtractor.java
##########
@@ -46,4 +48,16 @@
    * @return The output GenericRow
    */
   GenericRow extract(T from, GenericRow to);
+
+  /**
+   * Converts a field of the given input record. The field value will be 
converted to either a single value

Review comment:
       We should return `byte[]` instead of `ByteBuffer`

##########
File path: 
pinot-spi/src/main/java/org/apache/pinot/spi/data/readers/RecordExtractor.java
##########
@@ -46,4 +48,16 @@
    * @return The output GenericRow
    */
   GenericRow extract(T from, GenericRow to);
+
+  /**
+   * Converts a field of the given input record. The field value will be 
converted to either a single value
+   * (string, number, bytebuffer), multi value (Object[]) or a Map.
+   *
+   * Natively Pinot only understands single values and multi values.
+   * Map is useful only if some ingestion transform functions operates on it 
in the transformation layer.
+   *
+   * @param value the field value to be converted
+   * @return The converted field value
+   */
+  Object convert(@Nullable V value);

Review comment:
       We might not want to pass `null` into the `convert()`. Check the value 
before calling `convert()`

##########
File path: 
pinot-spi/src/main/java/org/apache/pinot/spi/data/readers/RecordExtractor.java
##########
@@ -46,4 +48,16 @@
    * @return The output GenericRow
    */
   GenericRow extract(T from, GenericRow to);
+
+  /**
+   * Converts a field of the given input record. The field value will be 
converted to either a single value
+   * (string, number, bytebuffer), multi value (Object[]) or a Map.
+   *
+   * Natively Pinot only understands single values and multi values.
+   * Map is useful only if some ingestion transform functions operates on it 
in the transformation layer.
+   *
+   * @param value the field value to be converted
+   * @return The converted field value
+   */
+  Object convert(@Nullable V value);

Review comment:
       Are we returning `null` for empty array/collection/map? If so, let's add 
the behavior to the javadoc and annotate the return value as nullable

##########
File path: 
pinot-spi/src/main/java/org/apache/pinot/spi/data/readers/AbstractDefaultRecordExtractor.java
##########
@@ -0,0 +1,166 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.spi.data.readers;
+
+import java.nio.ByteBuffer;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.Map;
+import javax.annotation.Nullable;
+
+
+/**
+ * Abstract class for extracting and converting the fields of various data 
formats into supported Pinot data types.
+ *
+ * @param <T> the format of the input record
+ * @param <V> value used for converting the nested/complex fields of the file 
format (e.g. GenericRecord for Avro).

Review comment:
       Not sure how much value this generic type `V` can provide. IMO 
`convertRecord(Object value)` should be good enough (similar to 
`convertMap(Object value)` etc.)

##########
File path: 
pinot-spi/src/main/java/org/apache/pinot/spi/data/readers/AbstractDefaultRecordExtractor.java
##########
@@ -0,0 +1,166 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.spi.data.readers;
+
+import java.nio.ByteBuffer;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.Map;
+import javax.annotation.Nullable;
+
+
+/**
+ * Abstract class for extracting and converting the fields of various data 
formats into supported Pinot data types.
+ *
+ * @param <T> the format of the input record
+ * @param <V> value used for converting the nested/complex fields of the file 
format (e.g. GenericRecord for Avro).
+ *            In most cases, this will be the same type as {@code T}.
+ */
+public abstract class AbstractDefaultRecordExtractor<T, V> implements 
RecordExtractor<T, Object> {
+
+  /**
+   * Converts the field value to either a single value (string, number, 
bytebuffer), multi value (Object[]) or a Map.
+   * Returns {@code null} if the field value is {@code null}.
+   *
+   * Natively Pinot only understands single values and multi values.
+   * Map is useful only if some ingestion transform functions operates on it 
in the transformation layer.
+   */
+  @Nullable
+  public Object convert(@Nullable Object value) {
+    if (value == null) {
+      return null;
+    }
+    Object convertedValue;
+    if (isInstanceOfMultiValue(value)) {
+      convertedValue = convertMultiValue(value);
+    } else if (isInstanceOfMap(value)) {
+      convertedValue = convertMap(value);
+    } else if (isInstanceOfRecord(value)) {
+      convertedValue = convertRecord((V) value);
+    } else {
+      convertedValue = convertSingleValue(value);
+    }
+    return convertedValue;
+  }
+
+  /**
+   * Returns whether the object is an instance of the data format's base type.
+   */
+  protected abstract boolean isInstanceOfRecord(Object value);

Review comment:
       Return `false` for default implementation?

##########
File path: 
pinot-spi/src/main/java/org/apache/pinot/spi/data/readers/AbstractDefaultRecordExtractor.java
##########
@@ -0,0 +1,166 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.spi.data.readers;
+
+import java.nio.ByteBuffer;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.Map;
+import javax.annotation.Nullable;
+
+
+/**
+ * Abstract class for extracting and converting the fields of various data 
formats into supported Pinot data types.
+ *
+ * @param <T> the format of the input record
+ * @param <V> value used for converting the nested/complex fields of the file 
format (e.g. GenericRecord for Avro).
+ *            In most cases, this will be the same type as {@code T}.
+ */
+public abstract class AbstractDefaultRecordExtractor<T, V> implements 
RecordExtractor<T, Object> {
+
+  /**
+   * Converts the field value to either a single value (string, number, 
bytebuffer), multi value (Object[]) or a Map.
+   * Returns {@code null} if the field value is {@code null}.
+   *
+   * Natively Pinot only understands single values and multi values.
+   * Map is useful only if some ingestion transform functions operates on it 
in the transformation layer.
+   */
+  @Nullable
+  public Object convert(@Nullable Object value) {
+    if (value == null) {
+      return null;
+    }
+    Object convertedValue;
+    if (isInstanceOfMultiValue(value)) {
+      convertedValue = convertMultiValue(value);
+    } else if (isInstanceOfMap(value)) {
+      convertedValue = convertMap(value);
+    } else if (isInstanceOfRecord(value)) {
+      convertedValue = convertRecord((V) value);
+    } else {
+      convertedValue = convertSingleValue(value);
+    }
+    return convertedValue;
+  }
+
+  /**
+   * Returns whether the object is an instance of the data format's base type.
+   */
+  protected abstract boolean isInstanceOfRecord(Object value);
+
+  /**
+   * Returns whether the object is of a multi-value type. Override this method 
if the data format represents
+   * multi-value objects differently.
+   */
+  protected boolean isInstanceOfMultiValue(Object value) {
+    return value instanceof Collection;
+  }
+
+  /**
+   * Returns whether the object is of a map type. Override this method if the 
data format represents map objects
+   * differently.
+   */
+  protected boolean isInstanceOfMap(Object value) {
+    return value instanceof Map;
+  }
+
+  /**
+   * Handles the conversion of every field of the object for the particular 
data format.
+   */
+  @Nullable
+  protected abstract Object convertRecord(V value);

Review comment:
       Throw `UnsupportedOperationException` for default implementation?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to