[GitHub] [pinot] jasperjiaguo commented on pull request #10636: Support for ARG_MIN and ARG_MAX Functions

via GitHub Sun, 07 May 2023 19:15:37 -0700


jasperjiaguo commented on PR #10636:
URL: https://github.com/apache/pinot/pull/10636#issuecomment-1537644964


   > Thinking more on my previous comment.....
   > 
   > May be one way to workaround the NULL business is to output array when we 
have duplicates where the min and max is happening ?
   > 
   > This query
   > 
   > ```
   > SELECT 
   > argmin(intCol, **stringCol**),  
   > argmin(intCol, **doubleCol**), 
   > sum(doubleCol)  
   > FROM table
   > ```
   > 
   > can output
   > 
   > argmin(intCol, stringCol)  argmin(intCol, doubleCol)       sum(doubleCol)
   > ["a2", "a11"]      [2.0, 3.0]      9.0
   > Similarly, the following query
   > 
   > ```
   > SELECT 
   > intCol, 
   > argmin(longCol, **doubleCol**),  
   > argmin(longCol, **longCol**)   
   > FROM table 
   > GROUP BY intCol
   > ```
   > 
   > Can output
   > 
   > intCol     argmin(longCol, doubleCol)      argmin(longCol, longCol)
   > 1  [2.0, 3.0]      2
   > 2  4.0     1
   > This is probably a more intuitive way to reason about response and is more 
SQL friendly imo and avoids populating NULLs.
   > 
   > @jasperjiaguo wdyt ?
   
   Agreed that null filling can be confusing for group ids. I have made a 
change for the group id value filling and it now behaves like:
   ```
   SELECT 
   intCol, 
   argmin(longCol, **doubleCol**),  
   argmin(longCol, **longCol**)   
   FROM table 
   GROUP BY intCol
   ```
   
   intCol | argmin(longCol, doubleCol) | argmin(longCol, longCol)
   -- | -- | --
   1 | 2.0 | 2
   1 | 3.0 | 2
   2 | 4.0 | 1
   
   SELECT 
   argmin(intCol, **stringCol**),  
   argmin(intCol, **doubleCol**), 
   sum(doubleCol)  
   FROM table
   
   argmin(intCol, stringCol) | argmin(intCol, doubleCol) | sum(doubleCol)
   -- | -- | --
   "a2" | 2.0 | 9.0
   "a11"* | 3.0 | 9.0
   
   which is essentially flattened view of 
   
   -- | -- | --
   1 | [2.0, 3.0] | 2
   2 | 4.0 | 1
   
   and 
   
   -- | -- | --
   ["a2", "a11"] | [2.0, 3.0] | 9.0
   
   respectively
   
   Meanwhile, I have also considered the option array fashion of returning 
multiple rows of output, there are a few reasons I didn't use it:
   
   1. It wouldn't work for all MV types as we currently don't have sth like 
ARRAY[ARRAY[INT]] for returned results
   2. It would be easier for the user to parse the result when this is 
flattened, as the user side will not need to flatten + align them on their own 
when they are projecting multiple cols.
   3. Using the flattened view will keep the output column type the same as the 
data column type, which I feel is cleaner.
   
   cc @siddharthteotia @somandal 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

[GitHub] [pinot] jasperjiaguo commented on pull request #10636: Support for ARG_MIN and ARG_MAX Functions

Reply via email to