[GitHub] [incubator-pinot] Jackie-Jiang commented on a change in pull request #6877: Enhance JSON index to support nested array

GitBox Mon, 10 May 2021 18:06:33 -0700


Jackie-Jiang commented on a change in pull request #6877:
URL: https://github.com/apache/incubator-pinot/pull/6877#discussion_r629778303




##########
File path: 
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/realtime/impl/json/MutableJsonIndex.java
##########
@@ -186,43 +186,54 @@ private RoaringBitmap 
getMatchingFlattenedDocIds(FilterContext filter) {
    */
   private RoaringBitmap getMatchingFlattenedDocIds(Predicate predicate) {
     ExpressionContext lhs = predicate.getLhs();
-    Preconditions.checkState(lhs.getType() == 
ExpressionContext.Type.IDENTIFIER,
+    Preconditions.checkArgument(lhs.getType() == 
ExpressionContext.Type.IDENTIFIER,
         "Left-hand side of the predicate must be an identifier, got: %s (%s). 
Put double quotes around the identifier if needed.",
         lhs, lhs.getType());
     String key = lhs.getIdentifier();
 
+    // Support 2 formats:
+    // - JSONPath format (e.g. "$.a[1].b"='abc', "$[0]"=1, "$"='abc')
+    // - Legacy format (e.g. "a[1].b"='abc')
+    if (key.charAt(0) == '$') {
+      key = key.substring(1);
+    } else {
+      key = JsonUtils.KEY_SEPARATOR + key;
+    }
+
     // Process the array index within the key if exists
-    // E.g. "foo[0].bar[1].foobar"='abc' -> foo.$index=0 && foo.bar.$index=1 
&& foo.bar.foobar='abc'
+    // E.g. "[*]"=1 -> "."='1'
+    // E.g. "[0]"=1 -> ".$index"='0' && "."='1'
+    // E.g. "[0][1]"=1 -> ".$index"='0' && "..$index"='1' && ".."='1'
+    // E.g. ".foo[*].bar[*].foobar"='abc' -> ".foo..bar..foobar"='abc'
+    // E.g. ".foo[0].bar[1].foobar"='abc' -> ".foo.$index"='0' && 
".foo..bar.$index"='1' && ".foo..bar..foobar"='abc'
+    // E.g. ".foo[0][1].bar"='abc' -> ".foo.$index"='0' && ".foo..$index"='1' 
&& ".foo...bar"='abc'
     RoaringBitmap matchingDocIds = null;
     int leftBracketIndex;
-    while ((leftBracketIndex = key.indexOf('[')) > 0) {
-      int rightBracketIndex = key.indexOf(']');
-      Preconditions.checkState(rightBracketIndex > leftBracketIndex, "Missing 
right bracket in key: %s", key);
+    while ((leftBracketIndex = key.indexOf('[')) >= 0) {
+      int rightBracketIndex = key.indexOf(']', leftBracketIndex + 2);
+      Preconditions.checkArgument(rightBracketIndex > 0, "Missing right 
bracket in key: %s", key);
 
       String leftPart = key.substring(0, leftBracketIndex);
-      int arrayIndex;
-      try {
-        arrayIndex = Integer.parseInt(key.substring(leftBracketIndex + 1, 
rightBracketIndex));
-      } catch (Exception e) {
-        throw new IllegalStateException("Invalid key: " + key);
-      }
+      String arrayIndex = key.substring(leftBracketIndex + 1, 
rightBracketIndex);
       String rightPart = key.substring(rightBracketIndex + 1);
 
-      // foo[1].bar -> foo.$index=1
-      String searchKey =
-          leftPart + JsonUtils.KEY_SEPARATOR + JsonUtils.ARRAY_INDEX_KEY + 
BaseJsonIndexCreator.KEY_VALUE_SEPARATOR
-              + arrayIndex;
-      RoaringBitmap docIds = _postingListMap.get(searchKey);
-      if (docIds != null) {
-        if (matchingDocIds == null) {
-          matchingDocIds = docIds.clone();
+      if (!arrayIndex.equals(JsonUtils.WILDCARD)) {
+        // "[0]"=1 -> ".$index"='0' && "."='1'
+        // ".foo[1].bar"='abc' -> ".foo.$index"=1 && ".foo..bar"='abc'
+        String searchKey = leftPart + JsonUtils.ARRAY_INDEX_KEY + 
BaseJsonIndexCreator.KEY_VALUE_SEPARATOR + arrayIndex;
+        RoaringBitmap docIds = _postingListMap.get(searchKey);

Review comment:
       Are you suggesting storing tuples like `[leftPart, 
JsonUtils.ARRAY_INDEX_KEY, BaseJsonIndexCreator.KEY_VALUE_SEPARATOR, 
arrayIndex]` and cache the hashcode of each part?
   I can see that it might be able to reuse some strings and reduce some 
hashcode calculation, but it also has the following potential fallbacks:
   - The tuple will be mutable, so maintaining the hashcode cache won't be 
easy. The extra logic of maintaining the cache might bring other overhead
   - The tuple comparison logic won't be very straight forward. It might have 
to end up concatenate the strings in the tuple, or split the query string in 
order to compare them, which eliminate the benefit of the tuple
   - The code won't be as readable as the string concatenation
   
   This optimization is out of the scope of this PR. We can have a separate PR 
implementing this optimization if this part of the code become the hotspot




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

[GitHub] [incubator-pinot] Jackie-Jiang commented on a change in pull request #6877: Enhance JSON index to support nested array

Reply via email to