KKcorps opened a new pull request, #10406: URL: https://github.com/apache/pinot/pull/10406
If a segment has `validDocIds` as empty and `enableSnapshot: true`, we persist an empty `validDocIdSnapshot` file on disk. During the next rest however, if we find the `validDocIdSnapshot` file is empty and not null, we simply do not read any rows from that segment for upsert. This however has a side effect that we never set `validDocIds` value for that segment inside memory. Ideally in this case, that value should be empty. But currently, it is set `null`. During the query phase, we check for `validDocIds` from the segment. If a segment has `null` validDocIds, we assume that all rows inside segment to be valid. This leads to older rows being returned in the query from this segment after restart. ```java BaseFilterOperator filterOperator = constructPhysicalOperator(filter, numDocs); if (validDocIdsSnapshot != null) { BaseFilterOperator validDocFilter = new BitmapBasedFilterOperator(validDocIdsSnapshot, false, numDocs); return FilterOperatorUtils.getAndFilterOperator(_queryContext, Arrays.asList(filterOperator, validDocFilter), numDocs); } else { return filterOperator; } ``` Possible Solutions: * Simply set an empty bitmap instead of null during restart. Rows still not read from the segment so it is fast. * Do not persist empty snapshot file. This works however during restart we will end up reading each and every row from this segment and then discarding them later on. This will affect server restart time significantly. I have taken the first approach for this solution since it gives better performance. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org