gortiz opened a new pull request, #16258:
URL: https://github.com/apache/pinot/pull/16258

   This PR introduces a new framework for testing UDFs.
   
   A key point is the UDF class, where we can define:
   - Function names
   - Mapping to scalar functions
   - Mapping to transform functions
   - High-level semantics, known as examples
   
   These UDFs can be registered as a Java service that can be loaded with 
ServiceLoader.
   
   The mappings are designed to register the UDF on the various factories used 
for queries, ingestion, etc., but this PR currently doesn't add them there. 
Instead, this PR is focused on the examples.
   
   These **examples** are things like `1 (int) + 2 (int) = 3 (int)`, but they 
can be more advanced examples like `1 (int) + null = null, but if null handling 
is disabled, then it is equal to 1 (int)`.
   
   ## Scenarios
   
   The PR also defines the concept of UdfTestScenario, which describes a method 
for executing UDFs. For example, running a UDF as a transform function in SSE 
or in an intermediate stage is not the same. The ability to test how each UDF 
behaves in different scenarios is the main reason for creating this framework. 
Some of these scenarios utilize a UdfTestCluster, which is an abstraction that 
allows for creating tables, ingesting rows, and running queries. The single 
implementation provided uses the integration framework we have to start a local 
cluster.
   
   These are the scenarios included in this PR:
   - IntermediateUdfTestScenario, which uses the UDF on an intermediate stage 
(with and without null handling). Right now, it is implemented using a query 
that works for any UDF with at least one argument.
   - TransformationUdfTestScenario, which uses the UDF in an SSE projection, 
which should call it as a TransformFunction
   - PredicateUdfTestScenario, which uses the UDF as a predicate (which should 
internally use the scalar function, but I need to verify it)
   - ExpressionTransformerTestScenario, which doesn't use the cluster, but 
instead FunctionEvaluator directly.
   
   More scenarios can be added in the future, or the current ones can be 
modified.
   
   ## UdfTestFramework
   
   This is the class that receives the UDFs, scenarios, and a cluster. It 
creates an execution matrix using the Cartesian product between all the 
examples of the UDFs and all the scenarios, resulting in multiple test cases 
that are executed on the given cluster.
   
   These tests are not as simple as "run the code and assert that the returned 
value is equal to the expected value" because Pinot semantics are a bit more 
complex. For example, we may have a UdfExample that says  `1 (int) + 2 (int) = 
3 (int)`, but when running that on some scenarios, the actual result could be 
`1 (double)`. This is why, for each example, we return a UdfExampleTestResult, 
which includes the expected result and the actual result. Then we compare them 
using different _equivalence_ methods, defined in 
`UdfTestFramework.EquivalenceLevel`.
   
   ## UdfTest
   
   The final key class is the UdfTest class, which utilizes the 
UdfTestFramework to run various tests. Specifically, it heavily utilizes 
snapshot tests, which means that it uses older executions (stored as 
repositioned YAML files in a given folder) and compares them with the current 
results. There are tests that:
   - Verify each UDF
   - Verify that no new scalar or transform function is created without adding 
a UDF function
   - Verify that no scalar or transform function is deleted without removing 
its snapshot as well
   
   
   PS: Working on this PR I've found and fixed an issue when unexpected UDFs 
are used. This PR includes a partial fix, but 
https://github.com/apache/pinot/pull/16257 includes a more extensive one.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to