yashmayya opened a new pull request, #13835:
URL: https://github.com/apache/pinot/pull/13835

   - Currently, the star-tree index doesn’t support configurable inputs for 
aggregation functions.
   - Taking the 
[DISTINCTCOUNTHLL](https://docs.pinot.apache.org/configuration-reference/functions/distinctcounthll)
 function as an example, this means that there’s no way to provide the `log2m` 
parameter value in a star-tree index configuration and the star-tree will be 
created using the default value of 8. Furthermore, this isn't taken into 
account at query time.
   - So, if there is a star-tree index for a `DistinctCountHll` aggregation 
(with default `log2m` value of 8) on column `col`, and a user makes a query 
like `select DISTINCTCOUNTHLL(col, 16)...`, the query will still use the 
star-tree index. In the best case, this means that the query will return 
incorrect results (with lower than desired accuracy) if the aggregation query 
can be served wholly using the index itself. In the worst case however, when 
that’s not possible and additional aggregation is required, this leads to an 
error since `HyperLogLog`s with different `log2m` values can’t be merged - see 
https://github.com/apache/pinot/issues/12839.
   - This patch introduces a mechanism to allow configuring the aggregation 
function parameters for a star-tree index and also a mechanism to match 
query-time aggregation functions to only the appropriate star-tree index.
   - Unfortunately, this does require a lot of aggregation function specific 
logic to be introduced. For instance, `HyperLogLog`s with different `log2m` 
values can't be merged as pointed out above. However, two instances of 
`HyperLogLogPlus` with the same `p` value but different `sp` values can be 
merged.  Instances of `UltraLogLog` with different `p` values can be merged, 
instances of `TDigest` with different compression factors can be merged and so 
on.
   - The star-tree index configuration's `aggregationConfigs` section now 
optionally takes in a `functionParameters` map to allow for a user-friendly way 
of configuring the star-tree index aggregation function parameters. For example:
   ```
   {
     "starTreeIndexConfigs": [
       {
         "dimensionsSplitOrder": [
           "d1"
         ],
         "aggregationConfigs": [
           {
             "columnName": "m1",
             "aggregationFunction": "DISTINCTCOUNTHLL",
             "functionParameters": {
               "log2m": 16
             }
           }
         ]
       }
     ]
   }
   ```
   
   ```
    "starTreeIndexConfigs": [
       {
         "dimensionsSplitOrder": [
           "d1"
         ],
         "aggregationConfigs": [
           {
             "columnName": "m1",
             "aggregationFunction": "DISTINCTCOUNTHLLPLUS",
             "functionParameters": {
               "p": 10,
               "sp": 20
             }
           }
         ]
       }
     ]
   }
   ```
   - It is now also possible to have multiple star-tree indexes for 
`DISTINCTCOUNTHLL` on the same column with different `log2m` values (and only 
the appropriate one will be used for every query).
   - Appropriate default value handling has also been added so that it isn't 
necessary to explicitly configure function parameters and also to ensure that 
older segments continue working as expected. Note that in older segments with 
star-tree indexes that were erroneously being used for a query like `select 
DISTINCTCOUNTHLL(col, 16)...`, this will no longer be the case (and the index 
will only be used for queries like `select DISTINCTCOUNTHLL(col, 8)...` or 
`select DISTINCTCOUNTHLL(col)...`).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to