[
https://issues.apache.org/jira/browse/KAFKA-17254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17870920#comment-17870920
]
Krishna V Reddy commented on KAFKA-17254:
-----------------------------------------
[~chia7712] I did a lot more testing and I think I found the possible root
cause. It turns out we are adding our own custom JARs to *kafka/libs* during
docker build. Not sure why adding our own jars breaks logging *starting from
3.7.1* but that seems to be root cause. When I delete our custom JARs and run
the Kafka tool CLI command it just works!
Here's what our libs directory looks like after a docker build:
{code:java}
bash-4.4$ ls -la libs/
total 169988
drwxr--r--. 2 odosvc odosvc 12288 Aug 5 04:01 .
drwxr--r--. 7 odosvc odosvc 4096 Aug 5 04:01 ..
-rwxr--r--. 1 odosvc odosvc 69409 Aug 5 04:01 activation-1.1.1.jar
-rwxr--r--. 1 odosvc odosvc 27006 Aug 5 04:01
aopalliance-repackaged-2.6.1.jar
-rwxr--r--. 1 odosvc odosvc 90347 Aug 5 04:01 argparse4j-0.7.0.jar
-rwxr--r--. 1 odosvc odosvc 20891 Aug 5 04:01
audience-annotations-0.12.0.jar
-rwxr--r--. 1 odosvc odosvc 912143 Aug 5 04:01 caffeine-2.9.3.jar
...
-rwxr--r--. 1 odosvc odosvc 52605537 Aug 5 04:01 krishnav-custom.6.jar # These
seem to break logging starting from 3.7.1
-rwxr--r--. 1 odosvc odosvc 14802 Aug 5 04:01 krishanv-custom.6-sources.jar
# These seem to break logging starting from 3.7.1
...
-rwxr--r--. 1 odosvc odosvc 6650280 Aug 5 04:01 zstd-jni-1.5.6-3.jar {code}
And then in our *Dockerfile* we copy our custom JARs into the libs directory
like so:
{code:java}
COPY ./target/krishnav-*.jar $KAFKA_HOME/libs/ {code}
Our custom JARs basically implement a metrics reporter class and link it in our
configuration like so:
{code:java}
metric.reporters=krishnav.metrics.KrishnaVMetricsReporter
telemetry.project=krishnav
telemetry.fleet=krishnav{code}
And this is what the above class looks like:
{code:java}
@Slf4j
@NoArgsConstructor
public class KrishnaVMetricsReporter implements MetricsReporter {
private Map<String, ?> configs;
protected ScheduledReporter reporter;
protected volatile boolean isInitialized = false;
// configure() will be called when an instance of T2MetricsReporter is
created.
// with the consumer config properties originally passed to KafkaConsumer
client.
@Override
public void configure(Map<String, ?> configs) {
this.configs = configs;
log.info("----configure start----");
for (Entry<String, ?> entry : configs.entrySet()) {
log.info("{}:{}", entry.getKey(), entry.getValue());
}
log.info("----configure end----");
}
@Override
public void init(List<KafkaMetric> metrics) {
log.info("Initializing {}", KrishnaVMetricsReporter.class.getName());
reporter = ensureSingletonReporter();
isInitialized = true;
}
@Override
public void metricChange(KafkaMetric metric) {
ensureInitialized();
}
@Override
public void metricRemoval(KafkaMetric metric) {
ensureInitialized();
}
@Override
public void close() {
ensureInitialized();
log.info("Closing {}", KrishnaVMetricsReporter.class.getName());
ScheduledReporterFactory.stop();
}
// using protected method to create a reporter because unit test needs
// to override it.
protected ScheduledReporter ensureSingletonReporter() {
try {
return ScheduledReporterFactory.ensureReporter(configs);
} catch (Exception e) {
log.error("error creating singleton reporter", e);
throw new RuntimeException(e);
}
}
private void ensureInitialized() {
if (!isInitialized) {
log.error("Class not initialized via init(). Cannot call class
method");
throw new IllegalStateException("Class not initialized via init().
Cannot call class method");
}
}
}{code}
And this class basically initializes a scheduled reporter that periodically
emits metrics. If including the code for the other classes helps let me know
but again all of this code was just working in 3.7.0 and before. Did something
change in 3.7.1 where adding new jars to *${KAFKA_HOME}/libs/* breaks things?
Also I really appreciate all of your help [~chia7712] :)
> Kafka Server/CLI no longer respects log4j.properties/tools-log4j.properties
> file
> --------------------------------------------------------------------------------
>
> Key: KAFKA-17254
> URL: https://issues.apache.org/jira/browse/KAFKA-17254
> Project: Kafka
> Issue Type: Bug
> Components: admin, logging, tools
> Affects Versions: 3.8.0, 3.7.1
> Reporter: Krishna V Reddy
> Priority: Major
> Attachments: 螢幕快照 2024-08-05 10-55-02.png
>
>
> Kafka CLI tools and Kafka server for that matter no longer seem to respect
> *KAFKA_LOG4J_OPTS* environment variable starting from v3.7.1 in Kraft mode
> and seems to always log data to console. This regression only started
> happening from 3.7.1 as rolling back to 3.7.0 fixes this issue.
> Here's an example where we list kafka topics using kafka CLI tool in v3.7.1:
> {code:java}
> bash-4.4$ bin/kafka-topics.sh --list --bootstrap-server localhost:21503
> --command-config /tmp/kafka/config/kraft/client.properties
> 00:11:29.415 [main] INFO org.apache.kafka.clients.admin.AdminClientConfig --
> AdminClientConfig values:
> auto.include.jmx.reporter = true
> bootstrap.controllers = []
> bootstrap.servers = [localhost:21503]
> ...
> hello-world # Actual topic output
> 00:11:29.960 [main] DEBUG org.apache.kafka.clients.admin.KafkaAdminClient --
> [AdminClient clientId=adminclient-1] Initiating close operation.
> 00:11:29.960 [main] DEBUG org.apache.kafka.clients.admin.KafkaAdminClient --
> [AdminClient clientId=adminclient-1] Waiting for the I/O thread to exit. Hard
> shutdown in 31536000000 ms.
> 00:11:29.961 [kafka-admin-client-thread | adminclient-1] INFO
> org.apache.kafka.common.utils.AppInfoParser -- App info kafka.admin.client
> for adminclient-1 unregistered
> 00:11:29.967 [kafka-admin-client-thread | adminclient-1] INFO
> org.apache.kafka.common.metrics.Metrics -- Metrics scheduler closed
> 00:11:29.967 [kafka-admin-client-thread | adminclient-1] INFO
> org.apache.kafka.common.metrics.Metrics -- Closing reporter
> org.apache.kafka.common.metrics.JmxReporter
> 00:11:29.967 [kafka-admin-client-thread | adminclient-1] INFO
> org.apache.kafka.common.metrics.Metrics -- Metrics reporters closed
> 00:11:29.967 [kafka-admin-client-thread | adminclient-1] DEBUG
> org.apache.kafka.clients.admin.KafkaAdminClient -- [AdminClient
> clientId=adminclient-1] Exiting AdminClientRunnable thread.
> 00:11:29.968 [main] DEBUG org.apache.kafka.clients.admin.KafkaAdminClient --
> [AdminClient clientId=adminclient-1] Kafka admin client closed. {code}
> Running the same command in v3.7.0 gets us:
> {code:java}
> bash-4.4$ bin/kafka-topics.sh --list --bootstrap-server localhost:21503
> --command-config /tmp/kafka/config/kraft/client.properties
> hello-world{code}
> I added logging in *kafka-run-class.sh* to make sure the correct log4j
> properties file is being selected and sure enough the default pre-packaged
> *config/tools-log4j.properties* is selected by this script and the
> environment variable *KAFKA_LOG4J_OPTS* is set with this properties file:
> {code:java}
> # Log4j settings
> if [ -z "$KAFKA_LOG4J_OPTS" ]; then
> # Log to console. This is a tool.
> LOG4J_DIR="$base_dir/config/tools-log4j.properties"
> # If Cygwin is detected, LOG4J_DIR is converted to Windows format.
> (( WINDOWS_OS_FORMAT )) && LOG4J_DIR=$(cygpath --path --mixed
> "${LOG4J_DIR}")
> KAFKA_LOG4J_OPTS="-Dlog4j.configuration=file:${LOG4J_DIR}"
> else
> # create logs directory
> if [ ! -d "$LOG_DIR" ]; then
> mkdir -p "$LOG_DIR"
> fi
> fi{code}
> The *config/tools-log4j.properties* used in both 3.7.0 and 3.7.1 is the same
> and contains the following contents(Kafka set defaults):
> {code:java}
> log4j.rootLogger=WARN, stderr
> log4j.appender.stderr=org.apache.log4j.ConsoleAppender
> log4j.appender.stderr.layout=org.apache.log4j.PatternLayout
> log4j.appender.stderr.layout.ConversionPattern=[%d] %p %m (%c)%n
> log4j.appender.stderr.Target=System.err
> # for connect-plugin-path
> log4j.logger.org.reflections=ERROR{code}
> This is a very bizarre bug since nothing changed from our end other than
> either upgrading to 3.7.1. We also tried upgrading to 3.8.0 to see if it goes
> away but it doesn't. What makes this worse is that the same issue also
> happens with Kafka server. Kafka server starting from 3.7.1 only seems to add
> logs to console and no longer respects *kafka.logs.dir* system property nor
> does it seem to be reading our log4j configuration file that we set like so:
> {code:java}
> export
> KAFKA_LOG4J_OPTS="-Dlog4j.configuration=file:${KAFKA_CONFIG_DIRECTORY}/log4j.properties"
> {code}
> Our log4j properties file for our server looks like so:
> {code:java}
> log4j.rootLogger=INFO, stdout, kafkaAppender
> log4j.appender.stdout=org.apache.log4j.ConsoleAppender
> log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
> log4j.appender.stdout.layout.ConversionPattern=[%d] %p %m (%c)%n
> log4j.appender.kafkaAppender=org.apache.log4j.RollingFileAppender
> log4j.appender.kafkaAppender.MaxFileSize=1GB
> log4j.appender.kafkaAppender.MaxBackupIndex=10
> log4j.appender.kafkaAppender.File=${kafka.logs.dir}/server.log
> log4j.appender.kafkaAppender.layout=org.apache.log4j.PatternLayout
> log4j.appender.kafkaAppender.layout.ConversionPattern=[%d] %p %m (%c)%n
> log4j.appender.stateChangeAppender=org.apache.log4j.RollingFileAppender
> log4j.appender.stateChangeAppender.MaxFileSize=1GB
> log4j.appender.stateChangeAppender.MaxBackupIndex=10
> log4j.appender.stateChangeAppender.File=${kafka.logs.dir}/state-change.log
> log4j.appender.stateChangeAppender.layout=org.apache.log4j.PatternLayout
> log4j.appender.stateChangeAppender.layout.ConversionPattern=[%d] %p %m (%c)%n
> log4j.appender.requestAppender=org.apache.log4j.RollingFileAppender
> log4j.appender.requestAppender.MaxFileSize=1GBlog4j.appender.requestAppender.MaxBackupIndex=10
> log4j.appender.requestAppender.File=${kafka.logs.dir}/kafka-request.log
> log4j.appender.requestAppender.layout=org.apache.log4j.PatternLayout
> log4j.appender.requestAppender.layout.ConversionPattern=[%d] %p %m (%c)%n
> log4j.appender.cleanerAppender=org.apache.log4j.RollingFileAppender
> log4j.appender.cleanerAppender.MaxFileSize=1GB
> log4j.appender.cleanerAppender.MaxBackupIndex=10
> log4j.appender.cleanerAppender.File=${kafka.logs.dir}/log-cleaner.log
> log4j.appender.cleanerAppender.layout=org.apache.log4j.PatternLayout
> log4j.appender.cleanerAppender.layout.ConversionPattern=[%d] %p %m (%c)%n
> log4j.appender.controllerAppender=org.apache.log4j.RollingFileAppender
> log4j.appender.controllerAppender.MaxFileSize=1GB
> log4j.appender.controllerAppender.MaxBackupIndex=10log4j.appender.controllerAppender.File=${kafka.logs.dir}/controller.log
> log4j.appender.controllerAppender.layout=org.apache.log4j.PatternLayoutlog4j.appender.controllerAppender.layout.ConversionPattern=[%d]
> %p %m (%c)%n
> log4j.appender.authorizerAppender=org.apache.log4j.RollingFileAppender
> log4j.appender.authorizerAppender.MaxFileSize=1GB
> log4j.appender.authorizerAppender.MaxBackupIndex=10
> log4j.appender.authorizerAppender.File=${kafka.logs.dir}/kafka-authorizer.loglog4j.appender.authorizerAppender.layout=org.apache.log4j.PatternLayout
> log4j.appender.authorizerAppender.layout.ConversionPattern=[%d] %p %m (%c)%n
> # Change the line below to adjust ZK client
> logginglog4j.logger.org.apache.zookeeper=INFO
> # Change the two lines below to adjust the general broker logging level
> (output to server.log and
> stdout)log4j.logger.kafka=INFOlog4j.logger.org.apache.kafka=INFO
> # Change to DEBUG or TRACE to enable request logging
> log4j.logger.kafka.request.logger=OFF, requestAppender
> log4j.additivity.kafka.request.logger=false
> # Uncomment the lines below and change
> log4j.logger.kafka.network.RequestChannel$ to TRACE for additional output
> # related to the handling of
> requests#log4j.logger.kafka.network.Processor=TRACE, requestAppender
> #log4j.logger.kafka.server.KafkaApis=TRACE, requestAppender
> #log4j.additivity.kafka.server.KafkaApis=false
> log4j.logger.kafka.network.RequestChannel$=WARN, requestAppender
> log4j.additivity.kafka.network.RequestChannel$=false
> # Change the line below to adjust KRaft mode controller
> logginglog4j.logger.org.apache.kafka.controller=INFO, controllerAppender
> log4j.additivity.org.apache.kafka.controller=false
> # Change the line below to adjust ZK mode controller logging
> log4j.logger.kafka.controller=TRACE, controllerAppender
> log4j.additivity.kafka.controller=false
> log4j.logger.kafka.log.LogCleaner=INFO, cleanerAppender
> log4j.additivity.kafka.log.LogCleaner=false
> log4j.logger.state.change.logger=TRACE, stateChangeAppender
> log4j.additivity.state.change.logger=false
> # Access denials are logged at INFO level, change to DEBUG to also log
> allowed accesses
> log4j.logger.kafka.authorizer.logger=INFO, authorizerAppender
> log4j.additivity.kafka.authorizer.logger=false {code}
> And here's how we initialize the appropriate log4j environment variables in
> our Kafka startup script:
> {code:java}
> export KAFKA_CONFIG_DIRECTORY="/tmp/kafka/kraft/config"
> export
> KAFKA_LOG4J_OPTS="-Dlog4j.configuration=file:${KAFKA_CONFIG_DIRECTORY}/log4j.properties"
> export LOG_DIR="/logs/kafka/"
> ...
> bin/kafka-server-start.sh
> ${KAFKA_CONFIG_DIRECTORY}/kraft/broker.properties{code}
> Again nothing changed here. It's the same code we have used since v3.5.0. Any
> ideas what's going wrong? Did 3.7.1 upgrade introduce any regressions or
> changes to Kafka logging?
> Here's an example of what the Kafka broker logs(emitted to console) look like
> after upgrading to 3.7.1+
> {code:java}
> 2024-08-04_00:35:13.89138 00:35:13.891 [kafka-2-raft-io-thread] DEBUG
> org.apache.kafka.raft.KafkaRaftClient -- [RaftManager id=2] Follower high
> watermark updated to 455399
> 2024-08-04_00:35:13.89159 00:35:13.891 [kafka-2-raft-io-thread] DEBUG
> org.apache.kafka.raft.KafkaRaftClient -- [RaftManager id=2] Notifying
> listener org.apache.kafka.image.loader.MetadataLoader@670337647 of batch for
> baseOffset 455398 and lastOffset OptionalLong.empty
> 2024-08-04_00:35:13.89169 00:35:13.891 [kafka-2-raft-outbound-request-thread]
> DEBUG org.apache.kafka.clients.NetworkClient -- [RaftManager id=2] Sending
> FETCH request with header RequestHeader(apiKey=FETCH, apiVersion=16,
> clientId=raft-client-2, correlationId=4602, headerVersion=2) and timeout 2000
> to node 3: FetchRequestData(clusterId='_EMgkMHpTcKhrGEMNo01zA', replicaId=-1,
> replicaState=ReplicaState(replicaId=2, replicaEpoch=-1), maxWaitMs=500,
> minBytes=0, maxBytes=8388608, isolationLevel=0, sessionId=0, sessionEpoch=-1,
> topics=[FetchTopic(topic='__cluster_metadata',
> topicId=AAAAAAAAAAAAAAAAAAAAAQ, partitions=[FetchPartition(partition=0,
> currentLeaderEpoch=82, fetchOffset=455400, lastFetchedEpoch=82,
> logStartOffset=-1, partitionMaxBytes=0)])], forgottenTopicsData=[], rackId='')
> 2024-08-04_00:35:13.89177 00:35:13.891
> [kafka-2-metadata-loader-event-handler] DEBUG
> org.apache.kafka.image.loader.MetadataBatchLoader -- [MetadataLoader id=2]
> handleCommit: Generated a metadata delta between 455397 and 455398 from 1
> batch(es) in 0 us.
> 2024-08-04_00:35:13.89182 00:35:13.891
> [kafka-2-metadata-loader-event-handler] DEBUG
> org.apache.kafka.image.loader.MetadataLoader -- [MetadataLoader id=2]
> handleCommit: publishing new image with provenance
> MetadataProvenance(lastContainedOffset=455398, lastContainedEpoch=82,
> lastContainedLogTimeMs=1722731713365).{code}
> P.S Our Kafka jars are directly downloaded from
> [https://kafka.apache.org/downloads] with no modifications to the binaries
--
This message was sent by Atlassian Jira
(v8.20.10#820010)