Jackie-Jiang commented on code in PR #12568:
URL: https://github.com/apache/pinot/pull/12568#discussion_r1515366550


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/realtime/impl/json/MutableJsonIndexImpl.java:
##########
@@ -291,6 +297,88 @@ private RoaringBitmap getMatchingFlattenedDocIds(Predicate 
predicate) {
       } else {
         return new RoaringBitmap();
       }
+    } else if (predicateType == Predicate.Type.REGEXP_LIKE) {
+      String ceilingKey = _postingListMap.ceilingKey(key + 
BaseJsonIndexCreator.KEY_VALUE_SEPARATOR);
+      if (ceilingKey == null || !ceilingKey.startsWith(key + 
BaseJsonIndexCreator.KEY_VALUE_SEPARATOR)) {
+        // No values with this key exist
+        return new RoaringBitmap();
+      }
+
+      Pattern pattern = ((RegexpLikePredicate) predicate).getPattern();
+      MutableRoaringBitmap result = null;
+      char nextChar = BaseJsonIndexCreator.KEY_VALUE_SEPARATOR + 1;
+      SortedMap<String, RoaringBitmap> subMap =

Review Comment:
   We can extract a method to calculate the subMap.



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/realtime/impl/json/MutableJsonIndexImpl.java:
##########
@@ -291,6 +297,88 @@ private RoaringBitmap getMatchingFlattenedDocIds(Predicate 
predicate) {
       } else {
         return new RoaringBitmap();
       }
+    } else if (predicateType == Predicate.Type.REGEXP_LIKE) {
+      String ceilingKey = _postingListMap.ceilingKey(key + 
BaseJsonIndexCreator.KEY_VALUE_SEPARATOR);
+      if (ceilingKey == null || !ceilingKey.startsWith(key + 
BaseJsonIndexCreator.KEY_VALUE_SEPARATOR)) {
+        // No values with this key exist
+        return new RoaringBitmap();
+      }
+
+      Pattern pattern = ((RegexpLikePredicate) predicate).getPattern();
+      MutableRoaringBitmap result = null;
+      char nextChar = BaseJsonIndexCreator.KEY_VALUE_SEPARATOR + 1;

Review Comment:
   (minor) We can make this char constant



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/realtime/impl/json/MutableJsonIndexImpl.java:
##########
@@ -291,6 +297,88 @@ private RoaringBitmap getMatchingFlattenedDocIds(Predicate 
predicate) {
       } else {
         return new RoaringBitmap();
       }
+    } else if (predicateType == Predicate.Type.REGEXP_LIKE) {
+      String ceilingKey = _postingListMap.ceilingKey(key + 
BaseJsonIndexCreator.KEY_VALUE_SEPARATOR);
+      if (ceilingKey == null || !ceilingKey.startsWith(key + 
BaseJsonIndexCreator.KEY_VALUE_SEPARATOR)) {
+        // No values with this key exist
+        return new RoaringBitmap();
+      }
+
+      Pattern pattern = ((RegexpLikePredicate) predicate).getPattern();
+      MutableRoaringBitmap result = null;
+      char nextChar = BaseJsonIndexCreator.KEY_VALUE_SEPARATOR + 1;
+      SortedMap<String, RoaringBitmap> subMap =
+          _postingListMap.subMap(ceilingKey, true, 
_postingListMap.floorKey(key + nextChar), true);
+
+      for (Map.Entry<String, RoaringBitmap> entry : subMap.entrySet()) {
+        if (!pattern.matcher(entry.getKey().substring(key.length() + 
1)).matches()) {
+          continue;
+        }
+        if (result == null) {
+          result = entry.getValue().toMutableRoaringBitmap();
+        } else {
+          result.or(entry.getValue().toMutableRoaringBitmap());
+        }
+      }
+
+      if (result == null) {
+        return new RoaringBitmap();
+      } else {
+        if (matchingDocIds == null) {
+          return result.toRoaringBitmap();
+        } else {
+          matchingDocIds.and(result.toRoaringBitmap());
+          return matchingDocIds;
+        }
+      }
+    } else if (predicateType == Predicate.Type.RANGE) {
+      String ceilingKey = _postingListMap.ceilingKey(key + 
BaseJsonIndexCreator.KEY_VALUE_SEPARATOR);
+      if (ceilingKey == null || !ceilingKey.startsWith(key + 
BaseJsonIndexCreator.KEY_VALUE_SEPARATOR)) {
+        // No values with this key exist
+        return new RoaringBitmap();
+      }
+
+      MutableRoaringBitmap result = null;
+      char nextChar = BaseJsonIndexCreator.KEY_VALUE_SEPARATOR + 1;
+      SortedMap<String, RoaringBitmap> subMap =
+          _postingListMap.subMap(ceilingKey, true, 
_postingListMap.floorKey(key + nextChar), true);
+
+      RangePredicate rangePredicate = (RangePredicate) predicate;
+      FieldSpec.DataType rangeDataType = rangePredicate.getRangeDataType();

Review Comment:
   In JSON, only number (double) and string are supported. We can probably 
simplify the handling to only differentiate double comparison and string 
comparison. This can prevent the corner case where value is double (e.g. `1.5`) 
in JSON, but being converted to integer when doing predicate `key > 1`.



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/realtime/impl/json/MutableJsonIndexImpl.java:
##########
@@ -291,6 +297,88 @@ private RoaringBitmap getMatchingFlattenedDocIds(Predicate 
predicate) {
       } else {
         return new RoaringBitmap();
       }
+    } else if (predicateType == Predicate.Type.REGEXP_LIKE) {
+      String ceilingKey = _postingListMap.ceilingKey(key + 
BaseJsonIndexCreator.KEY_VALUE_SEPARATOR);

Review Comment:
   Do we post key along? If so, we can directly try to find the key, and when 
calculating the subMap, exclude the lower bound



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/realtime/impl/json/MutableJsonIndexImpl.java:
##########
@@ -291,6 +297,88 @@ private RoaringBitmap getMatchingFlattenedDocIds(Predicate 
predicate) {
       } else {
         return new RoaringBitmap();
       }
+    } else if (predicateType == Predicate.Type.REGEXP_LIKE) {
+      String ceilingKey = _postingListMap.ceilingKey(key + 
BaseJsonIndexCreator.KEY_VALUE_SEPARATOR);

Review Comment:
   (minor) This is actually the lower boundary, consider naming it properly. 
Currently it is slightly confusing.



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/realtime/impl/json/MutableJsonIndexImpl.java:
##########
@@ -291,6 +297,88 @@ private RoaringBitmap getMatchingFlattenedDocIds(Predicate 
predicate) {
       } else {
         return new RoaringBitmap();
       }
+    } else if (predicateType == Predicate.Type.REGEXP_LIKE) {
+      String ceilingKey = _postingListMap.ceilingKey(key + 
BaseJsonIndexCreator.KEY_VALUE_SEPARATOR);
+      if (ceilingKey == null || !ceilingKey.startsWith(key + 
BaseJsonIndexCreator.KEY_VALUE_SEPARATOR)) {
+        // No values with this key exist
+        return new RoaringBitmap();
+      }
+
+      Pattern pattern = ((RegexpLikePredicate) predicate).getPattern();
+      MutableRoaringBitmap result = null;
+      char nextChar = BaseJsonIndexCreator.KEY_VALUE_SEPARATOR + 1;
+      SortedMap<String, RoaringBitmap> subMap =
+          _postingListMap.subMap(ceilingKey, true, 
_postingListMap.floorKey(key + nextChar), true);
+
+      for (Map.Entry<String, RoaringBitmap> entry : subMap.entrySet()) {
+        if (!pattern.matcher(entry.getKey().substring(key.length() + 
1)).matches()) {
+          continue;
+        }
+        if (result == null) {
+          result = entry.getValue().toMutableRoaringBitmap();

Review Comment:
   Suggest not converting it to `MutableRoaringBitmap`. `RoaringBitmap` itself 
is mutable, and is faster



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/realtime/impl/json/MutableJsonIndexImpl.java:
##########
@@ -291,6 +297,88 @@ private RoaringBitmap getMatchingFlattenedDocIds(Predicate 
predicate) {
       } else {
         return new RoaringBitmap();
       }
+    } else if (predicateType == Predicate.Type.REGEXP_LIKE) {
+      String ceilingKey = _postingListMap.ceilingKey(key + 
BaseJsonIndexCreator.KEY_VALUE_SEPARATOR);
+      if (ceilingKey == null || !ceilingKey.startsWith(key + 
BaseJsonIndexCreator.KEY_VALUE_SEPARATOR)) {
+        // No values with this key exist
+        return new RoaringBitmap();
+      }
+
+      Pattern pattern = ((RegexpLikePredicate) predicate).getPattern();
+      MutableRoaringBitmap result = null;
+      char nextChar = BaseJsonIndexCreator.KEY_VALUE_SEPARATOR + 1;
+      SortedMap<String, RoaringBitmap> subMap =
+          _postingListMap.subMap(ceilingKey, true, 
_postingListMap.floorKey(key + nextChar), true);

Review Comment:
   Do we need to calculate floor here? Can we directly put `key + "\u0001"` and 
make higher bound exclusive? This way we don't need to worry about key ending 
with "\u0001"



##########
pinot-common/src/main/java/org/apache/pinot/common/request/context/predicate/RangePredicate.java:
##########
@@ -67,15 +69,17 @@ public RangePredicate(ExpressionContext lhs, String range) {
     int upperLength = upper.length();
     _upperInclusive = upper.charAt(upperLength - 1) == UPPER_INCLUSIVE;
     _upperBound = upper.substring(0, upperLength - 1);
+    _rangeDataType = null;

Review Comment:
   Consider making this `UNKNOWN`? We should also put some comments to prevent 
people from using the data type if the predicate is constructed this way (which 
is the case most of the time).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to