Jackie-Jiang commented on code in PR #16922:
URL: https://github.com/apache/pinot/pull/16922#discussion_r2389523125
##########
pinot-broker/src/main/java/org/apache/pinot/broker/requesthandler/BaseBrokerRequestHandler.java:
##########
@@ -211,6 +212,10 @@ public BrokerResponse handleRequest(JsonNode request,
@Nullable SqlNodeAndOption
.putIfAbsent(Broker.Request.QueryOptionKey.ENABLE_NULL_HANDLING,
_enableNullHandling);
}
+
sqlNodeAndOptions.getOptions().putIfAbsent(QueryOptionKey.REGEXP_LIKE_DICTIONARY_THRESHOLD,
String.valueOf(
Review Comment:
To reduce overhead, let's put it only when the value is explicitly set on
broker
##########
pinot-core/src/main/java/org/apache/pinot/core/operator/filter/predicate/RegexpLikePredicateEvaluatorFactory.java:
##########
@@ -35,21 +41,27 @@ public class RegexpLikePredicateEvaluatorFactory {
private RegexpLikePredicateEvaluatorFactory() {
}
- /// When the cardinality of the dictionary is less than this threshold, scan
the dictionary to get the matching ids.
- public static final int DICTIONARY_CARDINALITY_THRESHOLD_FOR_SCAN = 10000;
-
/**
* Create a new instance of dictionary based REGEXP_LIKE predicate evaluator.
*
* @param regexpLikePredicate REGEXP_LIKE predicate to evaluate
- * @param dictionary Dictionary for the column
- * @param dataType Data type for the column
+ * @param dictionary Dictionary for the column
+ * @param dataType Data type for the column
+ * @param queryContext
* @return Dictionary based REGEXP_LIKE predicate evaluator
*/
public static BaseDictionaryBasedPredicateEvaluator
newDictionaryBasedEvaluator(
- RegexpLikePredicate regexpLikePredicate, Dictionary dictionary, DataType
dataType) {
+ RegexpLikePredicate regexpLikePredicate, Dictionary dictionary, DataType
dataType, QueryContext queryContext) {
Review Comment:
Annotate `QueryContext` as `@Nullable`
##########
pinot-core/src/main/java/org/apache/pinot/core/operator/filter/predicate/RegexpLikePredicateEvaluatorFactory.java:
##########
@@ -121,15 +133,22 @@ private static final class
ScanBasedRegexpLikePredicateEvaluator extends BaseDic
// Reuse matcher to avoid excessive allocation. This is safe to do because
the evaluator is always used
// within the scope of a single thread.
final Matcher _matcher;
+ Int2BooleanMap _dictIdMap;
public ScanBasedRegexpLikePredicateEvaluator(RegexpLikePredicate
regexpLikePredicate, Dictionary dictionary) {
super(regexpLikePredicate, dictionary);
_matcher = regexpLikePredicate.getPattern().matcher("");
+ _dictIdMap = new Int2BooleanOpenHashMap();
}
@Override
public boolean applySV(int dictId) {
- return _matcher.reset(_dictionary.getStringValue(dictId)).find();
+ if (_dictIdMap.containsKey(dictId)) {
+ return _dictIdMap.get(dictId);
+ }
+ boolean match =
_matcher.reset(_dictionary.getStringValue(dictId)).find();
+ _dictIdMap.put(dictId, match);
+ return match;
Review Comment:
This can be simplified with `computeIfAbsent()` to reduce map access
##########
pinot-core/src/main/java/org/apache/pinot/core/operator/filter/predicate/RegexpLikePredicateEvaluatorFactory.java:
##########
@@ -121,15 +133,22 @@ private static final class
ScanBasedRegexpLikePredicateEvaluator extends BaseDic
// Reuse matcher to avoid excessive allocation. This is safe to do because
the evaluator is always used
// within the scope of a single thread.
final Matcher _matcher;
+ Int2BooleanMap _dictIdMap;
Review Comment:
(minor) This can be `final`.
Let's also add some comments explaining the algorithm
##########
pinot-spi/src/main/java/org/apache/pinot/spi/utils/CommonConstants.java:
##########
@@ -709,6 +713,10 @@ public static class QueryOptionKey {
public static final String IN_PREDICATE_PRE_SORTED =
"inPredicatePreSorted";
public static final String IN_PREDICATE_LOOKUP_ALGORITHM =
"inPredicateLookupAlgorithm";
+ // REGEXP_LIKE adaptive threshold - controls when to switch between
dictionary-based and scan-based evaluation
+ // When (dictionary_size) < threshold, use dictionary-based evaluation
+ // When (dictionary_size) >= threshold, use scan-based evaluation
+ public static final String REGEXP_LIKE_DICTIONARY_THRESHOLD =
"regexpDictCardinalityThreshold";
Review Comment:
(minor) Keep the argument name and value consistent
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]