mcvsubbu commented on issue #6187:
URL: 
https://github.com/apache/incubator-pinot/issues/6187#issuecomment-718862336


   > https://eng.uber.com/operating-apache-pinot/
   > 
   > Reading this blog:
   > 
   > > As the scale of data grew, we also experienced several issues caused by 
too many segments. Pinot leverages Apache Helix over Apache Zookeeper for 
cluster management. For example, when a server transitioned from offline to 
online, Pinot will propagate state transition messages via Helix to notify 
other instances. The number of such state transition messages are proportional 
to the number of the segments on the server. When a server hosts too many 
segments, there could be a spike of state transition messages on Helix, 
resulting in lots of zookeeper nodes. If the number of zookeeper nodes is 
beyond the buffer threshold, the Pinot server and controller will crash. To 
solve this issue, we added message throttling to Pinot controllers to flatten 
the state transition surge.
   > 
   > At large scale of data that requires this kind of lazy loading, you're 
going to have _a lot_ of segments. Do we see this causing an issue with helix 
state management messages?
   
   If you have millions of segments per table, this can be a problem. We 
already faced this issue at LinkedIn, and took a few actions:
   1. Increased the network speed in zookeeper.
   2. Moved to allocate a limited number of instances for any table (i.e. 
cluster the segments of a table within as few hosts as possible).
   3. We tried enabling helix batching with mixed results. It seemed to help, 
but other helix bugs took over and prevented us from moving further on that. 
Some of those bugs may have been fixed, so we may be able to enable batching 
again.
   
   I believe we need to also explore segments in cold storage that DO NOT make 
it into the idealstate. I don't have any specific ideas along these lines, 
though :)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to