KKcorps opened a new pull request, #10406:
URL: https://github.com/apache/pinot/pull/10406

   If a segment has `validDocIds` as empty and `enableSnapshot: true`, we 
persist an empty `validDocIdSnapshot` file on disk.
   
   During the next rest however, if we find the `validDocIdSnapshot` file is 
empty and not null, we simply do not read any rows from that segment for 
upsert. 
   
   This however has a side effect that we never set `validDocIds` value for 
that segment inside memory. Ideally in this case, that value should be empty. 
But currently, it is set `null`.
    
   During the query phase, we check for `validDocIds` from the segment. If a 
segment has `null` validDocIds, we assume that all rows inside segment to be 
valid. This leads to older rows being returned in the query from this segment 
after restart.
    
    ```java
         BaseFilterOperator filterOperator = constructPhysicalOperator(filter, 
numDocs);
         if (validDocIdsSnapshot != null) {
           BaseFilterOperator validDocFilter = new 
BitmapBasedFilterOperator(validDocIdsSnapshot, false, numDocs);
           return FilterOperatorUtils.getAndFilterOperator(_queryContext, 
Arrays.asList(filterOperator, validDocFilter),
               numDocs);
         } else {
           return filterOperator;
         }
   ```
   
   Possible Solutions:
   
   * Simply set an empty bitmap instead of null during restart. Rows still not 
read from the segment so it is fast.
   
   * Do not persist empty snapshot file. This works however during restart we 
will end up reading each and every row from this segment and then discarding 
them later on. This will affect server restart time significantly.
   
   
   I have taken the first approach for this solution since it gives better 
performance.
   
   
   
    


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to