gortiz opened a new pull request, #15072:
URL: https://github.com/apache/pinot/pull/15072

   This simplification of #14994 is focused on adding MDC support on the query 
side. [MDC](https://www.baeldung.com/mdc-in-log4j-2-logback) is a common 
feature in logging frameworks that is usually build on top of thread local. The 
idea is that sometimes logs should include some contextual information we don't 
need on the rest of the code. A very clear example on Pinot is the request id. 
It may be useful to include that id on all the logs related to a query but we 
don't want to add the request id attribute/parameter everywhere just in case 
some part of the code will need it to create a log.
   
   One important part of using MDC is to actually be sure that the variables 
are cleaned up once the thread finished the request. Therefore I had to add 
some try-finally blocks in the code. I strongly recommend to review this PR 
with the ignore white changes option.
   
   This PR modifies the log4j2.xml file used on quickstarts to actually use 
these new MDC values.  Actual deployments will likely use their log4j2.xml 
files, and they will need to be modified to log request IDs and stage IDs. 
Please remember that this is optional. Logs are still as helpful as before, 
even if MDC properties are unused. Specifically, logs that already included the 
request ID or stage ID haven't been changed.
   
   It is important to know that in MSE we use more than one request id. 
Specifically, leaf stages change the request id used in the SSE part of the 
query. This is done in 
[ServerPlanRequestUtils](https://github.com/apache/pinot/blob/5f220b398cda1fde87a286492e05521d48923634/pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/plan/server/ServerPlanRequestUtils.java#L202)
 and it a controversial feature we may decide to change in the future. At 
logging level, it is not useful to have different request ids for the same 
query. This is why the request id set in MDC is shared between MSE and SSE 
parts. This will be easier to understand with some examples:
   
   Max rows limit in hash join. Here, we can see two new values between 
brackets: `qid` and `stg`. The first one is the query id (similar to request 
id) and the second is the stage id. The worker id is also included in MDC, but 
the proposed log4j2.xml is not using it.
   ```
   2025/02/16 13:45:23.864 ERROR [OpChainSchedulerService] 
[query-runner-on-45611-9-thread-6] [qid=884449079000000004,stg=2] 
(OpChain{884449079000000004_0_2}): Completed erroneously null {245=Cannot 
process join, reached number of rows limit: 1048576. Consider increasing the 
limit for the maximum number of rows in a join either via the query option 
'maxRowsInJoin' or the 'max_rows_in_join' hint in the 'joinOptions'. 
Alternatively, if partial results are acceptable, the join overflow mode can be 
set to 'BREAK' either via the query option 'joinOverflowMode' or the 
'join_overflow_mode' hint in the 'joinOptions'.
   ```
   
   A log in SSE code when being called by MSE. Here we can see we have two 
different request ids in the same log. The one called `qid` in square brackets 
(close to `stg`) is the one used in MSE, while the other (included in the 
normal message and called `requestId`) is the fake SSE id created on 
ServerPlanRequestUtils
   ```
   2025/02/16 13:43:54.724 INFO [ServerQueryLogger] 
[query-runner-on-45611-9-thread-2] [qid=884449079000000003,stg=2] Processed 
requestId=3584961748589019648,table=userAttributes_OFFLINE,segments(queried/processed/matched/consumingQueried/consumingProcessed/consumingMatched/invalid/limit/value)=2/1/1/-1/0/0/0/1/0,schedulerWaitMs=-1,reqDeserMs=-1,totalExecMs=55,resSerMs=-1,totalTimeMs=-1,minConsumingFreshnessMs=-1,broker=unknown,numDocsScanned=10,scanInFilter=0,scanPostFilter=40,sched=MultistageEngine,threadCpuTimeNs(total/thread/sysActivity/resSer)=4098950/4098950/0/0
   ```
   
   Same log, when called from SSE. Here the `qid` and `requestId` is the same 
both places and `stg` is not set
   ```
   2025/02/16 13:48:52.902 INFO [ServerQueryLogger] [pqr-0] 
[qid=884449079000000001] Processed 
requestId=884449079000000001,table=userAttributes_OFFLINE,segments(queried/processed/matched/consumingQueried/consumingProcessed/consumingMatched/invalid/limit/value)=2/1/1/-1/0/0/0/1/0,schedulerWaitMs=7,reqDeserMs=1,totalExecMs=3,resSerMs=3,totalTimeMs=14,minConsumingFreshnessMs=-1,broker=Broker_192.168.1.42_8000,numDocsScanned=10,scanInFilter=0,scanPostFilter=40,sched=FCFS,threadCpuTimeNs(total/thread/sysActivity/resSer)=947216/407270/539946/0
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to