jadami10 opened a new pull request, #17452:
URL: https://github.com/apache/pinot/pull/17452

   This is a potential bugfix addressing #16801.
   
   I know we've been saying to use log4j, but query logs need to be correlated 
since we log once when the query is received and once when it is completed. 
Also, this feature already exists, and log4j is tedious to configure/test. 
#15264 switched the received query logs to `.info` and to not respect rate 
limiting. Instead, "received" logs just consume rate limiting which, for high 
QPS clusters, leads to the majority of logs just being "received" logs.
   
   In this PR, the rate limiting is determined up front when the query received 
is logged. We choose whether to log "received", and that determines whether we 
log "completed" as well. This way, you should always get both logs for a given 
query rather than some arbitrary mix of both.
   
   This is slightly backwards incompatible. For clusters with default settings 
or low RPS, they won't notice a differences. They will continue to see all 
logs. 
   
   For clusters with higher RPS, 
`CONFIG_OF_BROKER_QUERY_LOG_MAX_RATE_PER_SECOND` is semantically changing to 
control the number of queries logged per second rather than the number of logs 
per second. For clusters where rate limit == RPS, they may see 2x the logs 
since this change will effectively cause "completed" to show up for each 
received query. For clusters where RPS >> rate limit, they will see a reduction 
in logs since the "received" query logs will not be rate limited, but the trade 
off is they will consistently see received/completed per query.
   
   I specifically tested this on an internal cluster that only saw intermittent 
rate limiting every hour.
   <img width="1603" height="415" alt="image" 
src="https://github.com/user-attachments/assets/22d6da32-5c67-44ad-934b-0abc8610c750";
 />
   
   After my change, you can see the log volume doubled during rate limiting, 
but there's no longer a mismatch between the number of "received" vs 
"completed" logs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to