mcvsubbu commented on issue #7229: URL: https://github.com/apache/pinot/issues/7229#issuecomment-892806865
If you are changing the state machine, it is best to draw a picture and float it. We can then look at each state transition. This has to be done in a compatible manner in Helix. @kishoreg can comment more on whether that is easy to do. Can we even do this without modifying the state machine? Say, the query lands on the server and the server finds that it does not have the segment. It can then hold the query until downloading the segment from deep store (a lazy download), and return the response to the broker like before. The only thing that needs to be done here is to have a larger time out to allow for the segment to be downloaded. The table config can set some criteria for segment availability in servers (say, upto 5d recent, or upto 5d after push in refresh use cases). It can also set some retention time (say. 15m). Such code complexity will also need to be evaluated against memory map on a fast local store (cloud costs for this may be high). You can memory-map 3TB worth segments off an SSD onto (say) 64G of memory, and the OS will pretty much keep these segments in disk all the time .. until a query needs it. And then it will be paged out in a timely manner. We have done this successfully in LinkedIn for several years now. Have you considered this option, or will that still be more expensive than actually lazy-loading the segment from deep store? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org