Re: [I] QueryLogger should fairly respect log throttling for all logs [pinot]

via GitHub Mon, 20 Oct 2025 02:38:54 -0700


gortiz commented on issue #16801:
URL: https://github.com/apache/pinot/issues/16801#issuecomment-3421250258


   I'm sorry for not getting back to you sooner. I think your concerns are 
valid. I firmly think the idea of QueryLogger is an antipattern by definition 
because it goes against the Log4j2 logging principles. Instead we should
   
   1. Classify logs correctly: Although recently we improved this by marking 
stable CID for queries, our classification is very poor. Basically, we only 
classify by CID and by logger class. Instead we should add other _classes_ like 
using markers for logs that may be spammy
   2. Do not log queries as part of the message. Include the SQL in the logger 
context instead.
   
   # Better classification using Slf4J markers and context
   
   We should never have something like:
   
   ```
   LOGGER.info("SQL query for request {}: {}", requestId, query);
   ```
   
   or
   
   ```
   public void log(QueryLogParams params) {
       ...
   
       // always log the query last - don't add this to the QueryLogEntry enum
       queryLogBuilder.append("query=")
           .append(StringUtils.substring(params._requestContext.getQuery(), 0, 
_maxQueryLengthToLog));
       _logger.info(queryLogBuilder.toString());
     }
   ```
   
   Instead we should log as follows:
   
   ```
       LoggerConstants.SQL_KEY.registerInMdc(params._requestContext.getQuery());
       try {
         LOGGER.info(QUERY_RECEIVED_MARKER, "SQL query started");
       } finally {
         LoggerConstants.SQL_KEY.unregisterFromMdc();
       }
   ```
   
   and
   
   ```  
       LoggerConstants.SQL_KEY.registerInMdc(params._requestContext.getQuery());
       try {
         // don't add the query to the message, obtain the actual SQL from 
QUERY_RECEIVED_MARKER if enabled
         //  or include in your pattern using %X{pinot.query.sql}
         _logger.info(QUERY_FINISHED_MARKER, queryLogBuilder.toString());
       } finally {
         LoggerConstants.SQL_KEY.unregisterFromMdc();
       }
     }
   ```
   
   Where `QUERY_RECEIVED_MARKER` and `QUERY_FINISHED_MARKER` are SLF4J markers 
with, for example, the `pinot.sql.received` and `pinot.sql.finished` values.
   
   # Decide what to log with Log4J
   
   Then each user can decide whether these logs should be printed and whether 
the query should be printed with them.
   
   For example, in log4j we could decide whether we want to print the initial 
SQL message with a filter like:
   
   ```
         <MarkerFilter marker="pinot.query.received" onMatch="ALLOW" 
onMismatch="NEUTRAL"/>
   ```
   
   Or disable that log with 
   
   ```
         <MarkerFilter marker="pinot.query.received" onMatch="DENY" 
onMismatch="NEUTRAL"/>
   ```
   
   We can also send the log to a pattern layout that includes 
`%X{pinot.query.sql}`, which will print the query, or to one that doesn't 
include that conversion, in which case the query won't be printed. We could 
even send it to a pattern layout like:
   
   ```
   [cid=%X{pinot.query.cid}] %m [query=%maxLen{%X{pinot.query.query}}{300}]
   ```
   
   So we would have a log like:
   ```
   [cid=123123123] SQL query started [query=select ...]
   ```
   
   Where the CID is unique for the SQL query, and the query will be limited to 
300 characters.
   
   Then the log printed when queries finish can be redirected to a pattern like
   
   ```
   [cid=%X{pinot.query.cid}] %m
   ```
   
   So the log will be:
   
   ```
   [cid=123123123] table=whatever, timeMs=whatever...
   ```
   
   But the query _won't_ be printed. Instead, we can pair that log with the 
previous one by CID. If instead we want to keep the same previous log, we can 
use a pattern like:
   
   ```
   [cid=%X{pinot.query.cid}] %m, query=%maxLen{%X{pinot.query.query}}{300}
   ```
   
   # Limit the log spam
   
   Instead of using the very opinionated QueryLogger, we can delegate to the 
standard BurstFilter. We can either send to that filter all the logs or all the 
logs related to queries. How can we do the latest? Either by sending them all 
the logs from QueryLoader (or the equivalent loggers if we end up removing this 
class)
   
   # Different Log4j2 sample configs
   
   Most users don't want to spend their time fine-tuning log4j2 config. 
Instead, we can provide several log4j2 sample files and modify the bash script, 
adding an argument to choose between them.
   
   # Cons
   
   The main problems I see with this approach are:
   - Keeping backward compatibility will be difficult. With that I mean to 
print the same logs.
   - Moving logic to log4j2 files means logging is more difficult to test
   
   # Advantages
   
   - Each customer can customize their logs as much as they want. They can even 
change the configuration without a restart (see 
https://logging.apache.org/log4j/2.x/manual/configuration.html#configuration-attribute-monitorInterval)
   - We can apply the same technique to spammy logs beyond the query.
   - We can go further and classify queries by a query option, so users could 
mark queries as important (and therefore always log them) or not important 
(which may use more constraint burst filter)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] QueryLogger should fairly respect log throttling for all logs [pinot]

Reply via email to