vvivekiyer opened a new issue, #8618:
URL: https://github.com/apache/pinot/issues/8618

   To make our query processing pipeline resilient to failures (like server 
slowness, crashes, etc), we can make some improvements in the broker and server 
components. I am working on a doc to identify and expand these issues with 
design and implementation details. The identified improvements are called out 
below:
   
   ### Server Selection
   When a  broker receives a query, it identifies the list of servers for each 
segment associated with the query. Among these identified servers, one server 
per segment is chosen based on a round-robin approach. This approach is 
light-weight and works for most cases. However, we can add some intelligence to 
this layer to pick servers more intelligently. Some of the parameters that we 
could use are:
   1. Server Latency
   2. Number of outstanding queries to server
   3. Query cost
   4. Server Load
   5. Server capability (heterogeneous servers).
   
   ### Fairness in Query Scheduling (Resource isolation)
   Currently, there’s no scheduling of queries either at broker or server based 
on weight of the query or priority assigned to a query. We can introduce a fair 
scheduling scheme at the broker and server level. Note that we have a 
PriorityScheduler that schedules queries based on priority but hasn’t been 
tested. We could improve/test it before working on alternatives.
   
   For example: consider three classes of queries:
   C1: high cost queries.
   C2: medium cost queries
   C3: low cost queries
   
   **Broker**
   Let's say broker receives 3 C1 queries, 3 C2 queries and 3 C3 queries. Now, 
say there are three available servers that can process these 9 queries that we 
received. We should ideally be distributing them to the servers as follows:
   Server 1 = {1 C1, 1 C2, 1 C3}
   Server 2 = {1 C1, 1 C2, 1 C3}
   Server 3 = {1 C1, 1 C2, 1 C3}
   
   This heavily overlaps with the “SERVER SELECTION” Section (1) of the 
document. But adding it as a separate topic because this could be done at each 
"Server" as opposed to "SERVER SELECTION" which would be done at the broker.
   
   
   ### Query Pre-emption at Broker and Server
   Currently, the broker pre-empts a query when the timeout is reached.
   Our pre-emption logic can be extended to both broker and servers to pre-empt 
based on a number of factors including
   
   1. Timeout
    Broker currently has logic to return queries that have reached timeout. 
However, this can be improved at the server
   2. Memory consumption - to avoid out-of-memory errors
   This could be a long-pole item because java (AFAIK) does not provide a 
reliable way of measuring heap usage.
    
   ### Rate Limiting
   Today, queries are rate-limited only based on QPS. We should improve 
rate-limiting to be based on a number of factors like load on server, latency, 
etc. Rate-limiting (QoS queue) can be done either at broker or server or both.
   
   ### Server Circuit Breaker
   We should take servers out of rotation (and adaptively bring them back) when 
servers are failing (n/w partitioned, disk failure, connectivity failures). We 
currently have a framework built for this in 
[#8490](https://github.com/apache/pinot/issues/8490). But we could improve this 
by adding more trigger points for marking a server unhealthy.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to