deepthi912 opened a new pull request, #16879:
URL: https://github.com/apache/pinot/pull/16879

   **Context:** Currently REGEXP_LIKE without FST/IFST index cases where the 
number of docs scanned exceed far beyond 10K, and the cardinality percent still 
remains low even if the cardinality of the dictionary exceeds 10k. So having a 
% configurable parameter for the user to switch between RAW scan and DICTIONARY 
based scan is necessary. This PR introduces this switch in 2 ways:
   
   The existing check for dictionary length works as is, if the dictionary 
length< 10000 it uses dictionary scan. For other cases, the switch was made 
based on % of docs in a segment both committed and consuming.
   
   Broker Configs:
   -  Check the threshold limit on the dict usage from 
`regexpDictCardinalityThreshold`. By default this is set to 10%.
   
   **Example:** select count(*) from mytable where 
REGEXP_LIKE(NewAddedSVJSONDimension, '.*')   
OPTION(regexpDictCardinalityThreshold=50000)
   
   `pinot.broker.regexp.dict.cardinality.threshold` - If one wants to set a 
threshold globally on broker level. 
   
   **Order of Priority:**
   Query options > Broker Configs


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to