Jackie-Jiang commented on code in PR #12568: URL: https://github.com/apache/pinot/pull/12568#discussion_r1515366550
########## pinot-segment-local/src/main/java/org/apache/pinot/segment/local/realtime/impl/json/MutableJsonIndexImpl.java: ########## @@ -291,6 +297,88 @@ private RoaringBitmap getMatchingFlattenedDocIds(Predicate predicate) { } else { return new RoaringBitmap(); } + } else if (predicateType == Predicate.Type.REGEXP_LIKE) { + String ceilingKey = _postingListMap.ceilingKey(key + BaseJsonIndexCreator.KEY_VALUE_SEPARATOR); + if (ceilingKey == null || !ceilingKey.startsWith(key + BaseJsonIndexCreator.KEY_VALUE_SEPARATOR)) { + // No values with this key exist + return new RoaringBitmap(); + } + + Pattern pattern = ((RegexpLikePredicate) predicate).getPattern(); + MutableRoaringBitmap result = null; + char nextChar = BaseJsonIndexCreator.KEY_VALUE_SEPARATOR + 1; + SortedMap<String, RoaringBitmap> subMap = Review Comment: We can extract a method to calculate the subMap. ########## pinot-segment-local/src/main/java/org/apache/pinot/segment/local/realtime/impl/json/MutableJsonIndexImpl.java: ########## @@ -291,6 +297,88 @@ private RoaringBitmap getMatchingFlattenedDocIds(Predicate predicate) { } else { return new RoaringBitmap(); } + } else if (predicateType == Predicate.Type.REGEXP_LIKE) { + String ceilingKey = _postingListMap.ceilingKey(key + BaseJsonIndexCreator.KEY_VALUE_SEPARATOR); + if (ceilingKey == null || !ceilingKey.startsWith(key + BaseJsonIndexCreator.KEY_VALUE_SEPARATOR)) { + // No values with this key exist + return new RoaringBitmap(); + } + + Pattern pattern = ((RegexpLikePredicate) predicate).getPattern(); + MutableRoaringBitmap result = null; + char nextChar = BaseJsonIndexCreator.KEY_VALUE_SEPARATOR + 1; Review Comment: (minor) We can make this char constant ########## pinot-segment-local/src/main/java/org/apache/pinot/segment/local/realtime/impl/json/MutableJsonIndexImpl.java: ########## @@ -291,6 +297,88 @@ private RoaringBitmap getMatchingFlattenedDocIds(Predicate predicate) { } else { return new RoaringBitmap(); } + } else if (predicateType == Predicate.Type.REGEXP_LIKE) { + String ceilingKey = _postingListMap.ceilingKey(key + BaseJsonIndexCreator.KEY_VALUE_SEPARATOR); + if (ceilingKey == null || !ceilingKey.startsWith(key + BaseJsonIndexCreator.KEY_VALUE_SEPARATOR)) { + // No values with this key exist + return new RoaringBitmap(); + } + + Pattern pattern = ((RegexpLikePredicate) predicate).getPattern(); + MutableRoaringBitmap result = null; + char nextChar = BaseJsonIndexCreator.KEY_VALUE_SEPARATOR + 1; + SortedMap<String, RoaringBitmap> subMap = + _postingListMap.subMap(ceilingKey, true, _postingListMap.floorKey(key + nextChar), true); + + for (Map.Entry<String, RoaringBitmap> entry : subMap.entrySet()) { + if (!pattern.matcher(entry.getKey().substring(key.length() + 1)).matches()) { + continue; + } + if (result == null) { + result = entry.getValue().toMutableRoaringBitmap(); + } else { + result.or(entry.getValue().toMutableRoaringBitmap()); + } + } + + if (result == null) { + return new RoaringBitmap(); + } else { + if (matchingDocIds == null) { + return result.toRoaringBitmap(); + } else { + matchingDocIds.and(result.toRoaringBitmap()); + return matchingDocIds; + } + } + } else if (predicateType == Predicate.Type.RANGE) { + String ceilingKey = _postingListMap.ceilingKey(key + BaseJsonIndexCreator.KEY_VALUE_SEPARATOR); + if (ceilingKey == null || !ceilingKey.startsWith(key + BaseJsonIndexCreator.KEY_VALUE_SEPARATOR)) { + // No values with this key exist + return new RoaringBitmap(); + } + + MutableRoaringBitmap result = null; + char nextChar = BaseJsonIndexCreator.KEY_VALUE_SEPARATOR + 1; + SortedMap<String, RoaringBitmap> subMap = + _postingListMap.subMap(ceilingKey, true, _postingListMap.floorKey(key + nextChar), true); + + RangePredicate rangePredicate = (RangePredicate) predicate; + FieldSpec.DataType rangeDataType = rangePredicate.getRangeDataType(); Review Comment: In JSON, only number (double) and string are supported. We can probably simplify the handling to only differentiate double comparison and string comparison. This can prevent the corner case where value is double (e.g. `1.5`) in JSON, but being converted to integer when doing predicate `key > 1`. ########## pinot-segment-local/src/main/java/org/apache/pinot/segment/local/realtime/impl/json/MutableJsonIndexImpl.java: ########## @@ -291,6 +297,88 @@ private RoaringBitmap getMatchingFlattenedDocIds(Predicate predicate) { } else { return new RoaringBitmap(); } + } else if (predicateType == Predicate.Type.REGEXP_LIKE) { + String ceilingKey = _postingListMap.ceilingKey(key + BaseJsonIndexCreator.KEY_VALUE_SEPARATOR); Review Comment: Do we post key along? If so, we can directly try to find the key, and when calculating the subMap, exclude the lower bound ########## pinot-segment-local/src/main/java/org/apache/pinot/segment/local/realtime/impl/json/MutableJsonIndexImpl.java: ########## @@ -291,6 +297,88 @@ private RoaringBitmap getMatchingFlattenedDocIds(Predicate predicate) { } else { return new RoaringBitmap(); } + } else if (predicateType == Predicate.Type.REGEXP_LIKE) { + String ceilingKey = _postingListMap.ceilingKey(key + BaseJsonIndexCreator.KEY_VALUE_SEPARATOR); Review Comment: (minor) This is actually the lower boundary, consider naming it properly. Currently it is slightly confusing. ########## pinot-segment-local/src/main/java/org/apache/pinot/segment/local/realtime/impl/json/MutableJsonIndexImpl.java: ########## @@ -291,6 +297,88 @@ private RoaringBitmap getMatchingFlattenedDocIds(Predicate predicate) { } else { return new RoaringBitmap(); } + } else if (predicateType == Predicate.Type.REGEXP_LIKE) { + String ceilingKey = _postingListMap.ceilingKey(key + BaseJsonIndexCreator.KEY_VALUE_SEPARATOR); + if (ceilingKey == null || !ceilingKey.startsWith(key + BaseJsonIndexCreator.KEY_VALUE_SEPARATOR)) { + // No values with this key exist + return new RoaringBitmap(); + } + + Pattern pattern = ((RegexpLikePredicate) predicate).getPattern(); + MutableRoaringBitmap result = null; + char nextChar = BaseJsonIndexCreator.KEY_VALUE_SEPARATOR + 1; + SortedMap<String, RoaringBitmap> subMap = + _postingListMap.subMap(ceilingKey, true, _postingListMap.floorKey(key + nextChar), true); + + for (Map.Entry<String, RoaringBitmap> entry : subMap.entrySet()) { + if (!pattern.matcher(entry.getKey().substring(key.length() + 1)).matches()) { + continue; + } + if (result == null) { + result = entry.getValue().toMutableRoaringBitmap(); Review Comment: Suggest not converting it to `MutableRoaringBitmap`. `RoaringBitmap` itself is mutable, and is faster ########## pinot-segment-local/src/main/java/org/apache/pinot/segment/local/realtime/impl/json/MutableJsonIndexImpl.java: ########## @@ -291,6 +297,88 @@ private RoaringBitmap getMatchingFlattenedDocIds(Predicate predicate) { } else { return new RoaringBitmap(); } + } else if (predicateType == Predicate.Type.REGEXP_LIKE) { + String ceilingKey = _postingListMap.ceilingKey(key + BaseJsonIndexCreator.KEY_VALUE_SEPARATOR); + if (ceilingKey == null || !ceilingKey.startsWith(key + BaseJsonIndexCreator.KEY_VALUE_SEPARATOR)) { + // No values with this key exist + return new RoaringBitmap(); + } + + Pattern pattern = ((RegexpLikePredicate) predicate).getPattern(); + MutableRoaringBitmap result = null; + char nextChar = BaseJsonIndexCreator.KEY_VALUE_SEPARATOR + 1; + SortedMap<String, RoaringBitmap> subMap = + _postingListMap.subMap(ceilingKey, true, _postingListMap.floorKey(key + nextChar), true); Review Comment: Do we need to calculate floor here? Can we directly put `key + "\u0001"` and make higher bound exclusive? This way we don't need to worry about key ending with "\u0001" ########## pinot-common/src/main/java/org/apache/pinot/common/request/context/predicate/RangePredicate.java: ########## @@ -67,15 +69,17 @@ public RangePredicate(ExpressionContext lhs, String range) { int upperLength = upper.length(); _upperInclusive = upper.charAt(upperLength - 1) == UPPER_INCLUSIVE; _upperBound = upper.substring(0, upperLength - 1); + _rangeDataType = null; Review Comment: Consider making this `UNKNOWN`? We should also put some comments to prevent people from using the data type if the predicate is constructed this way (which is the case most of the time). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org