gortiz commented on code in PR #16258: URL: https://github.com/apache/pinot/pull/16258#discussion_r2225574228
########## pinot-core/src/main/java/org/apache/pinot/core/udf/Udf.java: ########## @@ -0,0 +1,160 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.pinot.core.udf; + +import java.lang.reflect.Method; +import java.util.List; +import java.util.Map; +import java.util.Set; +import java.util.stream.Collectors; +import org.apache.arrow.util.Preconditions; +import org.apache.pinot.common.function.FunctionRegistry; +import org.apache.pinot.common.function.PinotScalarFunction; +import org.apache.pinot.common.function.TransformFunctionType; +import org.apache.pinot.core.operator.transform.function.TransformFunction; +import org.apache.pinot.spi.annotations.ScalarFunction; + + +/// The Udf interface represents a User Defined Function (UDF) in Pinot. +/// +/// In Pinot UDFs can be either [@ScalarFunction][org.apache.pinot.spi.annotations.ScalarFunction] or +/// [TransformFunction][org.apache.pinot.core.operator.transform.function.TransformFunction]. +/// The first are row based (are called once per row) and the second are block based (are called once per block), which +/// makes them more efficient for large datasets. +/// +/// These functions can be used in different parts of the Pinot query processing pipeline. For example, +/// TransformFunctions are when ProjectPlanNodes in SSE are materialized into TransformOpeartors, while scalar functions +/// are used mostly everywhere else, such as in filter expressions or even project nodes in MSE. +/// But although a ScalarFunction can be wrapped into a TransformFunction using ScalarTransformFunctionWrapper, +/// TransformFunctions cannot be used in places where ScalarFunctions are expected. +/// +/// Therefore in order to add a new function, one should always implement the ScalarFunction interface, and if +/// TransformFunction is needed, it can be implemented as a wrapper around the ScalarFunction. But this was not +/// automatically enforced by the APIs. +/// +/// This is why the Udf function was introduced. Udf interfaces should be the actual way to register an UDF in Pinot. +/// This interface is used to provide a unified way to describe UDFs, including their main function name, +/// description, examples and in future it could be used to register functions in TransformFunctionFactory and +/// FunctionRegistry (which is the one used to look for scalar functions). +/// +/// The examples are used to provide a set of examples for the function, which can be used in documentation or testing. +public abstract class Udf { + + /// The main function name of the UDF. + /// + /// This is treated as an ID, which means that on a single Pinot process there should be only one UDF with a given + /// main function name. + public abstract String getMainName(); + + /// Returns the main function name of the UDF, canonicalized as defined in [FunctionRegistry#canonicalize]. + /// This is used to ensure that the function name is in a consistent format, which is important for + /// function registration, lookup and reporting. + public String getMainCanonicalName() { + return FunctionRegistry.canonicalize(getMainName()); + } + + /// A set with all names of the functions that this UDF can be called with, including the main name. + /// + /// This is used to support different aliases for the same function, so that users can call the function. + public Set<String> getAllNames() { + return Set.of(getMainName()); + } + + /// Returns a set with all names of the functions that this UDF can be called with, including the main name, + /// canonicalized as defined in [FunctionRegistry#canonicalize]. + /// + /// This is used to ensure that the function names are in a consistent format, which is important for + /// function registration, lookup and reporting. + public Set<String> getAllCanonicalNames() { + return getAllNames().stream() + .map(FunctionRegistry::canonicalize) + .collect(Collectors.toSet()); + } + + /// A description of the UDF, which should be used in documentation or for debugging purposes. + /// + /// The description should be a human-readable text in markdown format that explains what the function does, + // language=markdown + public abstract String getDescription(); + + /// Returns the text that should be used in a SQL query to call the function. + /// + /// This is used to generate the SQL call for the function in test cases or documentation. + /// + /// @param name the name to be used. It should be one of the names returned by getAllFunctionNames(). + /// @param sqlArgValues the values of the arguments to be used in the SQL call. They can be either field references or + /// literal values, depending on the test case. + public String asSqlCall(String name, List<String> sqlArgValues) { + return name + "(" + String.join(", ", sqlArgValues) + ")"; + } + + /// Returns the examples for this Udf. + /// + /// As UDFs can have multiple signatures, the examples are grouped by them. + /// + /// It is recommended to use [UdfExampleBuilder] in order to build the examples for the Udf. + public abstract Map<UdfSignature, Set<UdfExample>> getExamples(); + + /// The pair of function type and transform function that implements this UDF. + /// + /// Unstable API: Transform functions still use an old model of registration using a model that is not polymorphic + public Map<TransformFunctionType, Class<? extends TransformFunction>> getTransformFunctions() { + return Map.of(); + } + + public abstract Set<PinotScalarFunction> getScalarFunctions(); Review Comment: That is a good question. For scalar functions, I have not found a single example yet where a UDF generates more than one scalar function. Maybe it was overengineering on my part. For transform functions, it is similar; however, in this case, we need to return the enum associated with it. The method that returns the transform functions will be changed once we add the ability to overload it, so the current method is not set in stone. I'm going to change them before merging the changes -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org