gyorfimi opened a new issue, #12306: URL: https://github.com/apache/pinot/issues/12306
Multistage engine and multivalue column semantics incompatibilities In Pinot filtering on a multivalue column with `=` operator actually a *contains* operator, aggregation to a multivalue column is actually an unnest operation. It works well in 1.0 engine even with filtering the result (multivalue column) in the `HAVING` clause. However, in 2.0 engine, this irregular behavior is not compatible with relation algebra semantics. That's why one should use ArrayToMv function, which tries to break the border between "normal" field semantics and multivalue field semantics. In most cases it works well, but there are some cases when it doesn't work as expected. Example: ```sql select id, mvc from (select id, arraytomv(mv) as mvc from mytable group by id, mvc) where mvc = 'a' ``` (assuming that `mvc` is a multivalue column) If there is a row in `mytable` with `mvc` containing `a` and 'b', the result will contain two rows with the same `id` and `mvc` = `a` and `b` respectively (even though the `where` clause is applied). The main cause of this behavior is that relation algebra transformation pushes down the `where` clause to the inner query like this: ```select id, mvc from (select id, arraytomv(mv) as mvc from mytable where arraytomv(mv) = 'a' group by id, mvc)``` In normal field semantics, this query will return only one row with `mvc` = `a`. However, in multivalue field semantics, it will return two rows with `mvc` = `a` and `b` respectively. So a complete rework of multivalue operators is needed. Multivalue is actually a set, not a value, so for example an `UNNEST(...)` operator is suggested (Calcite supports it), and for filter `mv = SOME(...)` is suggested (Calcie supports it too). For fast fix of this issue, it should be prevented to push down filter on the multivalue-column group by query to the inner query. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org