richardstartin commented on a change in pull request #6877: URL: https://github.com/apache/incubator-pinot/pull/6877#discussion_r630096361
########## File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/realtime/impl/json/MutableJsonIndex.java ########## @@ -186,43 +186,54 @@ private RoaringBitmap getMatchingFlattenedDocIds(FilterContext filter) { */ private RoaringBitmap getMatchingFlattenedDocIds(Predicate predicate) { ExpressionContext lhs = predicate.getLhs(); - Preconditions.checkState(lhs.getType() == ExpressionContext.Type.IDENTIFIER, + Preconditions.checkArgument(lhs.getType() == ExpressionContext.Type.IDENTIFIER, "Left-hand side of the predicate must be an identifier, got: %s (%s). Put double quotes around the identifier if needed.", lhs, lhs.getType()); String key = lhs.getIdentifier(); + // Support 2 formats: + // - JSONPath format (e.g. "$.a[1].b"='abc', "$[0]"=1, "$"='abc') + // - Legacy format (e.g. "a[1].b"='abc') + if (key.charAt(0) == '$') { + key = key.substring(1); + } else { + key = JsonUtils.KEY_SEPARATOR + key; + } + // Process the array index within the key if exists - // E.g. "foo[0].bar[1].foobar"='abc' -> foo.$index=0 && foo.bar.$index=1 && foo.bar.foobar='abc' + // E.g. "[*]"=1 -> "."='1' + // E.g. "[0]"=1 -> ".$index"='0' && "."='1' + // E.g. "[0][1]"=1 -> ".$index"='0' && "..$index"='1' && ".."='1' + // E.g. ".foo[*].bar[*].foobar"='abc' -> ".foo..bar..foobar"='abc' + // E.g. ".foo[0].bar[1].foobar"='abc' -> ".foo.$index"='0' && ".foo..bar.$index"='1' && ".foo..bar..foobar"='abc' + // E.g. ".foo[0][1].bar"='abc' -> ".foo.$index"='0' && ".foo..$index"='1' && ".foo...bar"='abc' RoaringBitmap matchingDocIds = null; int leftBracketIndex; - while ((leftBracketIndex = key.indexOf('[')) > 0) { - int rightBracketIndex = key.indexOf(']'); - Preconditions.checkState(rightBracketIndex > leftBracketIndex, "Missing right bracket in key: %s", key); + while ((leftBracketIndex = key.indexOf('[')) >= 0) { + int rightBracketIndex = key.indexOf(']', leftBracketIndex + 2); + Preconditions.checkArgument(rightBracketIndex > 0, "Missing right bracket in key: %s", key); String leftPart = key.substring(0, leftBracketIndex); - int arrayIndex; - try { - arrayIndex = Integer.parseInt(key.substring(leftBracketIndex + 1, rightBracketIndex)); - } catch (Exception e) { - throw new IllegalStateException("Invalid key: " + key); - } + String arrayIndex = key.substring(leftBracketIndex + 1, rightBracketIndex); String rightPart = key.substring(rightBracketIndex + 1); - // foo[1].bar -> foo.$index=1 - String searchKey = - leftPart + JsonUtils.KEY_SEPARATOR + JsonUtils.ARRAY_INDEX_KEY + BaseJsonIndexCreator.KEY_VALUE_SEPARATOR - + arrayIndex; - RoaringBitmap docIds = _postingListMap.get(searchKey); - if (docIds != null) { - if (matchingDocIds == null) { - matchingDocIds = docIds.clone(); + if (!arrayIndex.equals(JsonUtils.WILDCARD)) { + // "[0]"=1 -> ".$index"='0' && "."='1' + // ".foo[1].bar"='abc' -> ".foo.$index"=1 && ".foo..bar"='abc' + String searchKey = leftPart + JsonUtils.ARRAY_INDEX_KEY + BaseJsonIndexCreator.KEY_VALUE_SEPARATOR + arrayIndex; + RoaringBitmap docIds = _postingListMap.get(searchKey); Review comment: I wouldn't suggest making the tuple type mutable, no. The `String` class caches its own hash code so there is no cache to maintain, so no added complexity. I typically observe an integer multiple reduction in latency of hash map lookups when making this transformation, because it makes the lookup genuinely O(1) rather than O(k) where k is the length of the concatenated string. As mentioned, this would also reduce allocation per lookup. I agree this is out of the scope of the PR since it is an existing pattern, and that there is no profiling data to refer to in order to motivate the change, moreover, the cost of the lookup will be dwarfed by the subsequent bitmap intersection so you might not see it with a sampling profiler, but this looked like an efficiency smell when I was browsing this PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org