wirybeaver commented on issue #12774: URL: https://github.com/apache/pinot/issues/12774#issuecomment-2041665938
since json index internally store the flatten doc id, it will be tricky during transformation but is still doable with additional information about the flatten doc id length per docID. The idea below. Mutable Index | docID | 0 | 1 | 2 | 3 | 4 | |--------------------|---|---|---|---|----| | flattenDocIDLength | 3 | 5 | 7 | 2 | 10 | Let say the sortedDocID list in immutable index is [2, 3, 4, 1, 0]. | docID | 2 | 3 | 4 | 1 | 0 | |--------------------|---|---|----|---|---| | flattenDocIDLength | 7 | 2 | 10 | 5 | 3 | let say we can compute the array sortPos to reflect the index position in the sortedDocID. Use the example above. sortPos = [4, 3, 0, 1, 2]. Given the mutableDocID, we would know the position in the sortedDocID is sortPos[mutableDocID]. (assume the docid offset is 0 in the segment for easier explain) The flattenDocID start offset of the originalDocID would be changed from prefixSumOfMutableDocID[originalDocID] to prefixSumOfSortedDocID[sortPos[originalDocID]]. Take the docID 3 for example, it's flatten ids falls in [3+5+7, 3+5+7+2) the mutable index. Would be shifted to [7, 7+2). Given a flatten doc id of the mutable segment, we can use binary search over the prefixSumOfMutableDocID to induct the docID position in MutableSegment and then compute the flattenDocID in the immutable segment -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org