ianvkoeppe opened a new issue #5751: URL: https://github.com/apache/incubator-pinot/issues/5751
## Overview For hybrid setups, Pinot splits/filters broker queries to both the offline and realtime tables. It does so based on a `time value` which represents the latest available offline segments. Pinot assumes the consumer is uploading partial or intermediate segments throughout the current day, so it only serves `time value - 1` from the offline table and the rest from realtime. As an explicit example, if I upload a segment with the date 1/2/2000, the `time value` for the table will be 1/1/2000. This means, a query for all time, will be modified to query (*, 1/1/2000] to the offline table and (1/1/2000, *) to the realtime table. In this scenario, the 1/2/2000 segment is not being served queries. Once another segment is uploaded for 1/3/2000, then the data from 1/2/2000 will be served. ## Problem Statement In hybrid table setups where offline segments are only uploaded once per day, it is desired that those offline segments immediately start serving over the realtime segments. Today, this has to be achieved via a "hack" where an empty segment is pushed for a future date to trick Pinot into serving the latest offline segment with actual data. ## Requirements 1. This should be configurable per table. ## Proposed Solutions ### Update BrokerRequestHandler to Partition Queries Differently We could use a table-level config in the request handler to modify the time filtering behavior to include the latest offline segments based on the time boundary (already available in the request handler). ### Update TimeBoundaryManager Using a table-level config we could modify the time boundary manager to return current time value for inquiries about the current time boundary rather than the current behavior of `time value - 1`. ## Open Questions 1. At what point is the time value refreshed for a table? Segment uploads may contain more than one segment file. Does it wait until all are uploaded to increment or does this happen asynchronously and the time value may refresh while the others are still uploading? If the latter, then we risk creating a state of bad data where only some segments have been uploaded, but queries are already being diverted to the offline table. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org