gortiz commented on code in PR #16879:
URL: https://github.com/apache/pinot/pull/16879#discussion_r2374583692
##########
pinot-core/src/main/java/org/apache/pinot/core/operator/filter/predicate/RegexpLikePredicateEvaluatorFactory.java:
##########
@@ -36,21 +39,33 @@ public class RegexpLikePredicateEvaluatorFactory {
private RegexpLikePredicateEvaluatorFactory() {
}
- /// When the cardinality of the dictionary is less than this threshold, scan
the dictionary to get the matching ids.
- public static final int DICTIONARY_CARDINALITY_THRESHOLD_FOR_SCAN = 10000;
+ /// Default threshold when the cardinality of the dictionary is less than
this threshold,
+ // scan the dictionary to get the matching ids.
+ public static final int DEFAULT_DICTIONARY_CARDINALITY_THRESHOLD_FOR_SCAN =
10000;
/**
- * Create a new instance of dictionary based REGEXP_LIKE predicate evaluator.
+ * Create a new instance of dictionary based REGEXP_LIKE predicate evaluator
with configurable threshold.
*
* @param regexpLikePredicate REGEXP_LIKE predicate to evaluate
- * @param dictionary Dictionary for the column
- * @param dataType Data type for the column
+ * @param dictionary Dictionary for the column
+ * @param dataType Data type for the column
+ * @param numDocs Number of documents in the segment
+ * @param queryContext Query context containing query options (can be
null)
* @return Dictionary based REGEXP_LIKE predicate evaluator
*/
public static BaseDictionaryBasedPredicateEvaluator
newDictionaryBasedEvaluator(
- RegexpLikePredicate regexpLikePredicate, Dictionary dictionary, DataType
dataType) {
+ RegexpLikePredicate regexpLikePredicate, Dictionary dictionary, DataType
dataType, int numDocs,
+ QueryContext queryContext) {
Preconditions.checkArgument(dataType.getStoredType() == DataType.STRING,
"Unsupported data type: " + dataType);
- if (dictionary.length() < DICTIONARY_CARDINALITY_THRESHOLD_FOR_SCAN) {
+
+ // Get threshold from query options or use default
+ double threshold = Broker.DEFAULT_REGEXP_LIKE_ADAPTIVE_THRESHOLD;
+ if (queryContext != null && queryContext.getQueryOptions() != null) {
+ threshold =
QueryOptionsUtils.getRegexpLikeAdaptiveThreshold(queryContext.getQueryOptions(),
+ Broker.DEFAULT_REGEXP_LIKE_ADAPTIVE_THRESHOLD);
+ }
+ if (dictionary.length() < DEFAULT_DICTIONARY_CARDINALITY_THRESHOLD_FOR_SCAN
+ || (double) dictionary.length() / numDocs < threshold) {
Review Comment:
nit: I think it is worth to move this to its own function to increase
readability
##########
pinot-core/src/main/java/org/apache/pinot/core/operator/filter/predicate/RegexpLikePredicateEvaluatorFactory.java:
##########
@@ -36,21 +39,33 @@ public class RegexpLikePredicateEvaluatorFactory {
private RegexpLikePredicateEvaluatorFactory() {
}
- /// When the cardinality of the dictionary is less than this threshold, scan
the dictionary to get the matching ids.
- public static final int DICTIONARY_CARDINALITY_THRESHOLD_FOR_SCAN = 10000;
+ /// Default threshold when the cardinality of the dictionary is less than
this threshold,
+ // scan the dictionary to get the matching ids.
+ public static final int DEFAULT_DICTIONARY_CARDINALITY_THRESHOLD_FOR_SCAN =
10000;
/**
- * Create a new instance of dictionary based REGEXP_LIKE predicate evaluator.
+ * Create a new instance of dictionary based REGEXP_LIKE predicate evaluator
with configurable threshold.
*
* @param regexpLikePredicate REGEXP_LIKE predicate to evaluate
- * @param dictionary Dictionary for the column
- * @param dataType Data type for the column
+ * @param dictionary Dictionary for the column
+ * @param dataType Data type for the column
+ * @param numDocs Number of documents in the segment
+ * @param queryContext Query context containing query options (can be
null)
* @return Dictionary based REGEXP_LIKE predicate evaluator
*/
public static BaseDictionaryBasedPredicateEvaluator
newDictionaryBasedEvaluator(
- RegexpLikePredicate regexpLikePredicate, Dictionary dictionary, DataType
dataType) {
+ RegexpLikePredicate regexpLikePredicate, Dictionary dictionary, DataType
dataType, int numDocs,
+ QueryContext queryContext) {
Review Comment:
Can you create another static factory method with the same signature as the
previous one? Probably nobody outside this repo uses this method, but it is a
good practice not to break binary compatibility
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]