richardstartin commented on a change in pull request #6877:
URL: https://github.com/apache/incubator-pinot/pull/6877#discussion_r630096361



##########
File path: 
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/realtime/impl/json/MutableJsonIndex.java
##########
@@ -186,43 +186,54 @@ private RoaringBitmap 
getMatchingFlattenedDocIds(FilterContext filter) {
    */
   private RoaringBitmap getMatchingFlattenedDocIds(Predicate predicate) {
     ExpressionContext lhs = predicate.getLhs();
-    Preconditions.checkState(lhs.getType() == 
ExpressionContext.Type.IDENTIFIER,
+    Preconditions.checkArgument(lhs.getType() == 
ExpressionContext.Type.IDENTIFIER,
         "Left-hand side of the predicate must be an identifier, got: %s (%s). 
Put double quotes around the identifier if needed.",
         lhs, lhs.getType());
     String key = lhs.getIdentifier();
 
+    // Support 2 formats:
+    // - JSONPath format (e.g. "$.a[1].b"='abc', "$[0]"=1, "$"='abc')
+    // - Legacy format (e.g. "a[1].b"='abc')
+    if (key.charAt(0) == '$') {
+      key = key.substring(1);
+    } else {
+      key = JsonUtils.KEY_SEPARATOR + key;
+    }
+
     // Process the array index within the key if exists
-    // E.g. "foo[0].bar[1].foobar"='abc' -> foo.$index=0 && foo.bar.$index=1 
&& foo.bar.foobar='abc'
+    // E.g. "[*]"=1 -> "."='1'
+    // E.g. "[0]"=1 -> ".$index"='0' && "."='1'
+    // E.g. "[0][1]"=1 -> ".$index"='0' && "..$index"='1' && ".."='1'
+    // E.g. ".foo[*].bar[*].foobar"='abc' -> ".foo..bar..foobar"='abc'
+    // E.g. ".foo[0].bar[1].foobar"='abc' -> ".foo.$index"='0' && 
".foo..bar.$index"='1' && ".foo..bar..foobar"='abc'
+    // E.g. ".foo[0][1].bar"='abc' -> ".foo.$index"='0' && ".foo..$index"='1' 
&& ".foo...bar"='abc'
     RoaringBitmap matchingDocIds = null;
     int leftBracketIndex;
-    while ((leftBracketIndex = key.indexOf('[')) > 0) {
-      int rightBracketIndex = key.indexOf(']');
-      Preconditions.checkState(rightBracketIndex > leftBracketIndex, "Missing 
right bracket in key: %s", key);
+    while ((leftBracketIndex = key.indexOf('[')) >= 0) {
+      int rightBracketIndex = key.indexOf(']', leftBracketIndex + 2);
+      Preconditions.checkArgument(rightBracketIndex > 0, "Missing right 
bracket in key: %s", key);
 
       String leftPart = key.substring(0, leftBracketIndex);
-      int arrayIndex;
-      try {
-        arrayIndex = Integer.parseInt(key.substring(leftBracketIndex + 1, 
rightBracketIndex));
-      } catch (Exception e) {
-        throw new IllegalStateException("Invalid key: " + key);
-      }
+      String arrayIndex = key.substring(leftBracketIndex + 1, 
rightBracketIndex);
       String rightPart = key.substring(rightBracketIndex + 1);
 
-      // foo[1].bar -> foo.$index=1
-      String searchKey =
-          leftPart + JsonUtils.KEY_SEPARATOR + JsonUtils.ARRAY_INDEX_KEY + 
BaseJsonIndexCreator.KEY_VALUE_SEPARATOR
-              + arrayIndex;
-      RoaringBitmap docIds = _postingListMap.get(searchKey);
-      if (docIds != null) {
-        if (matchingDocIds == null) {
-          matchingDocIds = docIds.clone();
+      if (!arrayIndex.equals(JsonUtils.WILDCARD)) {
+        // "[0]"=1 -> ".$index"='0' && "."='1'
+        // ".foo[1].bar"='abc' -> ".foo.$index"=1 && ".foo..bar"='abc'
+        String searchKey = leftPart + JsonUtils.ARRAY_INDEX_KEY + 
BaseJsonIndexCreator.KEY_VALUE_SEPARATOR + arrayIndex;
+        RoaringBitmap docIds = _postingListMap.get(searchKey);

Review comment:
       I wouldn't suggest making the tuple type mutable, no. The `String` class 
caches its own hash code so there is no cache to maintain, so no added 
complexity. I typically observe an integer multiple reduction in latency of 
hash map lookups when making this transformation, because it makes the lookup 
genuinely O(1) rather than O(k) where k is the length of the concatenated 
string. As mentioned, this would also reduce allocation per lookup.
   
   I agree this is out of the scope of the PR since it is an existing pattern, 
and that there is no profiling data to refer to in order to motivate the 
change, moreover, the cost of the lookup will be dwarfed by the subsequent 
bitmap intersection so you might not see it with a sampling profiler, but this 
looked like an efficiency smell when I was browsing this PR.
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to