gortiz opened a new pull request, #16258: URL: https://github.com/apache/pinot/pull/16258
This PR introduces a new framework for testing UDFs. A key point is the UDF class, where we can define: - Function names - Mapping to scalar functions - Mapping to transform functions - High-level semantics, known as examples These UDFs can be registered as a Java service that can be loaded with ServiceLoader. The mappings are designed to register the UDF on the various factories used for queries, ingestion, etc., but this PR currently doesn't add them there. Instead, this PR is focused on the examples. These **examples** are things like `1 (int) + 2 (int) = 3 (int)`, but they can be more advanced examples like `1 (int) + null = null, but if null handling is disabled, then it is equal to 1 (int)`. ## Scenarios The PR also defines the concept of UdfTestScenario, which describes a method for executing UDFs. For example, running a UDF as a transform function in SSE or in an intermediate stage is not the same. The ability to test how each UDF behaves in different scenarios is the main reason for creating this framework. Some of these scenarios utilize a UdfTestCluster, which is an abstraction that allows for creating tables, ingesting rows, and running queries. The single implementation provided uses the integration framework we have to start a local cluster. These are the scenarios included in this PR: - IntermediateUdfTestScenario, which uses the UDF on an intermediate stage (with and without null handling). Right now, it is implemented using a query that works for any UDF with at least one argument. - TransformationUdfTestScenario, which uses the UDF in an SSE projection, which should call it as a TransformFunction - PredicateUdfTestScenario, which uses the UDF as a predicate (which should internally use the scalar function, but I need to verify it) - ExpressionTransformerTestScenario, which doesn't use the cluster, but instead FunctionEvaluator directly. More scenarios can be added in the future, or the current ones can be modified. ## UdfTestFramework This is the class that receives the UDFs, scenarios, and a cluster. It creates an execution matrix using the Cartesian product between all the examples of the UDFs and all the scenarios, resulting in multiple test cases that are executed on the given cluster. These tests are not as simple as "run the code and assert that the returned value is equal to the expected value" because Pinot semantics are a bit more complex. For example, we may have a UdfExample that says `1 (int) + 2 (int) = 3 (int)`, but when running that on some scenarios, the actual result could be `1 (double)`. This is why, for each example, we return a UdfExampleTestResult, which includes the expected result and the actual result. Then we compare them using different _equivalence_ methods, defined in `UdfTestFramework.EquivalenceLevel`. ## UdfTest The final key class is the UdfTest class, which utilizes the UdfTestFramework to run various tests. Specifically, it heavily utilizes snapshot tests, which means that it uses older executions (stored as repositioned YAML files in a given folder) and compares them with the current results. There are tests that: - Verify each UDF - Verify that no new scalar or transform function is created without adding a UDF function - Verify that no scalar or transform function is deleted without removing its snapshot as well PS: Working on this PR I've found and fixed an issue when unexpected UDFs are used. This PR includes a partial fix, but https://github.com/apache/pinot/pull/16257 includes a more extensive one. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org