gortiz opened a new pull request, #15072: URL: https://github.com/apache/pinot/pull/15072
This simplification of #14994 is focused on adding MDC support on the query side. [MDC](https://www.baeldung.com/mdc-in-log4j-2-logback) is a common feature in logging frameworks that is usually build on top of thread local. The idea is that sometimes logs should include some contextual information we don't need on the rest of the code. A very clear example on Pinot is the request id. It may be useful to include that id on all the logs related to a query but we don't want to add the request id attribute/parameter everywhere just in case some part of the code will need it to create a log. One important part of using MDC is to actually be sure that the variables are cleaned up once the thread finished the request. Therefore I had to add some try-finally blocks in the code. I strongly recommend to review this PR with the ignore white changes option. This PR modifies the log4j2.xml file used on quickstarts to actually use these new MDC values. Actual deployments will likely use their log4j2.xml files, and they will need to be modified to log request IDs and stage IDs. Please remember that this is optional. Logs are still as helpful as before, even if MDC properties are unused. Specifically, logs that already included the request ID or stage ID haven't been changed. It is important to know that in MSE we use more than one request id. Specifically, leaf stages change the request id used in the SSE part of the query. This is done in [ServerPlanRequestUtils](https://github.com/apache/pinot/blob/5f220b398cda1fde87a286492e05521d48923634/pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/plan/server/ServerPlanRequestUtils.java#L202) and it a controversial feature we may decide to change in the future. At logging level, it is not useful to have different request ids for the same query. This is why the request id set in MDC is shared between MSE and SSE parts. This will be easier to understand with some examples: Max rows limit in hash join. Here, we can see two new values between brackets: `qid` and `stg`. The first one is the query id (similar to request id) and the second is the stage id. The worker id is also included in MDC, but the proposed log4j2.xml is not using it. ``` 2025/02/16 13:45:23.864 ERROR [OpChainSchedulerService] [query-runner-on-45611-9-thread-6] [qid=884449079000000004,stg=2] (OpChain{884449079000000004_0_2}): Completed erroneously null {245=Cannot process join, reached number of rows limit: 1048576. Consider increasing the limit for the maximum number of rows in a join either via the query option 'maxRowsInJoin' or the 'max_rows_in_join' hint in the 'joinOptions'. Alternatively, if partial results are acceptable, the join overflow mode can be set to 'BREAK' either via the query option 'joinOverflowMode' or the 'join_overflow_mode' hint in the 'joinOptions'. ``` A log in SSE code when being called by MSE. Here we can see we have two different request ids in the same log. The one called `qid` in square brackets (close to `stg`) is the one used in MSE, while the other (included in the normal message and called `requestId`) is the fake SSE id created on ServerPlanRequestUtils ``` 2025/02/16 13:43:54.724 INFO [ServerQueryLogger] [query-runner-on-45611-9-thread-2] [qid=884449079000000003,stg=2] Processed requestId=3584961748589019648,table=userAttributes_OFFLINE,segments(queried/processed/matched/consumingQueried/consumingProcessed/consumingMatched/invalid/limit/value)=2/1/1/-1/0/0/0/1/0,schedulerWaitMs=-1,reqDeserMs=-1,totalExecMs=55,resSerMs=-1,totalTimeMs=-1,minConsumingFreshnessMs=-1,broker=unknown,numDocsScanned=10,scanInFilter=0,scanPostFilter=40,sched=MultistageEngine,threadCpuTimeNs(total/thread/sysActivity/resSer)=4098950/4098950/0/0 ``` Same log, when called from SSE. Here the `qid` and `requestId` is the same both places and `stg` is not set ``` 2025/02/16 13:48:52.902 INFO [ServerQueryLogger] [pqr-0] [qid=884449079000000001] Processed requestId=884449079000000001,table=userAttributes_OFFLINE,segments(queried/processed/matched/consumingQueried/consumingProcessed/consumingMatched/invalid/limit/value)=2/1/1/-1/0/0/0/1/0,schedulerWaitMs=7,reqDeserMs=1,totalExecMs=3,resSerMs=3,totalTimeMs=14,minConsumingFreshnessMs=-1,broker=Broker_192.168.1.42_8000,numDocsScanned=10,scanInFilter=0,scanPostFilter=40,sched=FCFS,threadCpuTimeNs(total/thread/sysActivity/resSer)=947216/407270/539946/0 ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org