Jackie-Jiang commented on code in PR #11739:
URL: https://github.com/apache/pinot/pull/11739#discussion_r1353494155


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/readers/json/ImmutableJsonIndexReader.java:
##########
@@ -76,6 +80,67 @@ public ImmutableJsonIndexReader(PinotDataBuffer dataBuffer, 
int numDocs) {
     _docIdMapping = dataBuffer.view(invertedIndexEndOffset, 
docIdMappingEndOffset, ByteOrder.LITTLE_ENDIAN);
   }
 
+  @Override
+  public String[] getValuesForKeyAndDocs(String key, int[] docIds, Map<Object, 
RoaringBitmap> cache) {
+    RoaringBitmap docIdMask = RoaringBitmap.bitmapOf(docIds);
+    int[] dictIds = getDictIdsForKey(key);
+    Map<Integer, String> docIdToValues = new HashMap<>(docIds.length);

Review Comment:
   Use `Int2ObjectOpenHashMap` and primitive int



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/readers/json/ImmutableJsonIndexReader.java:
##########
@@ -76,6 +80,67 @@ public ImmutableJsonIndexReader(PinotDataBuffer dataBuffer, 
int numDocs) {
     _docIdMapping = dataBuffer.view(invertedIndexEndOffset, 
docIdMappingEndOffset, ByteOrder.LITTLE_ENDIAN);
   }
 
+  @Override
+  public String[] getValuesForKeyAndDocs(String key, int[] docIds, Map<Object, 
RoaringBitmap> cache) {
+    RoaringBitmap docIdMask = RoaringBitmap.bitmapOf(docIds);
+    int[] dictIds = getDictIdsForKey(key);
+    Map<Integer, String> docIdToValues = new HashMap<>(docIds.length);
+
+    if (cache.isEmpty()) {
+      for (int dictId = dictIds[0]; dictId < dictIds[1]; dictId++) {
+        // get docIds from posting list, convert these to the actual docIds
+        ImmutableRoaringBitmap flattenedDocIds = 
_invertedIndex.getDocIds(dictId);
+        PeekableIntIterator it = flattenedDocIds.getIntIterator();
+        MutableRoaringBitmap realDocIds = new MutableRoaringBitmap();
+        while (it.hasNext()) {
+          realDocIds.add(getDocId(it.next()));
+        }
+        cache.put(dictId, realDocIds.toRoaringBitmap());
+      }
+    }
+
+    for (int dictId = dictIds[0]; dictId < dictIds[1]; dictId++) {
+      RoaringBitmap intersection = RoaringBitmap.and(cache.get(dictId), 
docIdMask);
+      if (intersection.isEmpty()) {
+        continue;
+      }
+      // dictionary value lookup, stripping the path prefix
+      String val = _dictionary.getStringValue(dictId).substring(key.length() + 
1);
+      for (int docId : intersection) {
+        docIdToValues.put(docId, val);
+      }
+    }
+
+    String[] values = new String[docIds.length];
+    for (int i = 0; i < docIds.length; i++) {
+      values[i] = docIdToValues.get(docIds[i]);

Review Comment:
   We can iterate over the map instead of the doc ids to fill the values



##########
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/reader/JsonIndexReader.java:
##########
@@ -31,4 +33,13 @@ public interface JsonIndexReader extends IndexReader {
    * Returns the matching document ids for the given filter.
    */
   MutableRoaringBitmap getMatchingDocIds(String filterString);
+
+  /**
+   * For a JSON key and array of docIds, returns the corresponding values for 
each docId.
+   * This method takes a map as input which is used to cache the posting list 
for each dictId/value. It will be
+   * populated if empty, otherwise it will be used to avoid reading and 
converting the posting list of flattened docs
+   *
+   * @return String[] where String[i] is the value for docIds[i]
+   */
+  String[] getValuesForKeyAndDocs(String key, int[] docIds, Map<Object, 
RoaringBitmap> cache);

Review Comment:
   We can consider adding a context class which can wrap this map and any other 
objects if needed in the future. For the dictId to bitmap map, we may use 
`Int2ObjectOpenHashMap` which is faster and uses less memory



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to