gyorfimi opened a new issue, #12306:
URL: https://github.com/apache/pinot/issues/12306

   Multistage engine and multivalue column semantics incompatibilities
   
   In Pinot filtering on a multivalue column with `=` operator actually a 
*contains* operator, aggregation to a  multivalue column is actually an unnest 
operation. 
   
   It works well in 1.0 engine even with filtering the result (multivalue 
column) in the `HAVING` clause.
   
   However, in 2.0 engine, this irregular behavior is not compatible with 
relation algebra semantics. That's why one should use ArrayToMv function, which 
tries to break the border between "normal" field semantics and multivalue field 
semantics.
   
   In most cases it works well, but there are some cases when it doesn't work 
as expected. 
   
   Example:
   
   ```sql
   select id, mvc from (select id, arraytomv(mv) as mvc from mytable group by 
id, mvc) where mvc = 'a'
   ```
   
   (assuming that `mvc` is a multivalue column)
   
   If there is a row in `mytable` with `mvc` containing `a` and 'b', the result 
will contain two rows with the same `id` and `mvc` = `a` and `b` respectively 
(even though the `where` clause is applied).
   
   The main cause of this behavior is that 
   relation algebra transformation pushes down the `where` clause to the inner 
query like this: ```select id, mvc from (select id, arraytomv(mv) as mvc from 
mytable where arraytomv(mv) = 'a' group by id, mvc)```
   
   In normal field semantics, this query will return only one row with `mvc` = 
`a`. However, in multivalue field semantics, it will return two rows with `mvc` 
= `a` and `b` respectively.
   
   So a complete rework of multivalue operators is needed. Multivalue is 
actually a set, not a value, so for example an `UNNEST(...)` operator is 
suggested (Calcite supports it), and for filter `mv = SOME(...)` is suggested 
(Calcie supports it too).
   
   For fast fix of this issue, it should be prevented to push down filter on 
the multivalue-column group by query to the inner query.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to