[jira] [Commented] (SOLR-11934) Visit Solr logging, it's too noisy.

Erick Erickson (Jira) Wed, 06 May 2020 08:25:36 -0700


    [ 
https://issues.apache.org/jira/browse/SOLR-11934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17100889#comment-17100889
 ]


Erick Erickson commented on SOLR-11934:
---------------------------------------

OK, getting back to this finally. I've analyzed one set of log files, with all 
the caveats that there may be special circumstances here. This is Solr 7x. I 
still have another set to go but wanted to start the discussion. Here's the 
summary of the classes that account for over 95% of the log messages in this 
sample.

I categorized these mostly by taking fragments of text from the logging calls, 
but for some I had to do something else, for instance HttpSolrCall just logs 
the call, there's no surrounding text so I bucketed on path instead.

It looks like this cluster was having some issues, so I'm not worried too much 
about SolrClientNodeStateProvider or RecoveryStrategy. But some of the others 
are...interesting.

I propose all the ones I've marked with an asterisk be moved to DEBUG level. 
The alternative is to set the entire class to default to WARN level in the 
logging config files.

Ones I'd like some comment on are marked with a ?. for instance, the ones in 
HttpSolrCall may be coming from outside? [~ab]  do you have any 
off-the-top-of-your-head comments about the admin/metrics calls?
{code:java}
 # calls     %                    Level                   class
     17,504 (pct: 0.010756) occurrences of WARN: 
o.a.s.c.s.i.SolrClientNodeStateProvider
     22,528 (pct: 0.013843) occurrences of INFO: 
o.a.s.c.s.i.SolrClientNodeStateProvider
     22,827 (pct: 0.014027) occurrences of INFO: o.a.s.c.c.ZkStateReader
     24,330 (pct: 0.014950) occurrences of INFO: o.a.s.c.RecoveryStrategy
     25,227 (pct: 0.015501) occurrences of INFO: o.a.s.c.S.Request
     74,128 (pct: 0.045549) occurrences of INFO: o.a.s.s.HttpSolrCall
    139,203 (pct: 0.085536) occurrences of INFO: o.a.s.u.SolrIndexWriter
    140,589 (pct: 0.086388) occurrences of INFO: o.a.s.s.SolrIndexSearcher
    142,723 (pct: 0.087699) occurrences of INFO: o.a.s.c.SolrCore
    281,150 (pct: 0.172758) occurrences of INFO: o.a.s.c.QuerySenderListener
    284,356 (pct: 0.174728) occurrences of INFO: o.a.s.u.DirectUpdateHandler2
    381,468 (pct: 0.234401) occurrences of INFO: 
o.a.s.u.p.LogUpdateProcessorFactory


Classes Of Interest (95% of log messages come from these classes)
   Lines in SolrCore
    2, msg: ERROR
    3, msg: core reload 
    4, msg: CLOSING SolrCore
    15, msg: Updating index properties...
*   1,047, msg: config update listener called for core
*   1,078, msg: Opening new SolrCore at
*  140,576, msg: Registered new searcher 
   Lines in LogUpdateProcessorFactory
*   381,468, msg: There's only one line
   Lines in SolrIndexWriter
*   139,203, msg: Calling setCommitData with IW:
? All of theseLines in HttpSolrCall
    1, msg: path=/admin/configs
    2, msg: path=/admin/zookeeper
    5, msg: path=/admin/info/logging
    9, msg: Unable to write response, client closed connection or we are 
shutting down
    20, msg: path=/admin/info/system
    63, msg: path=/cluster/autoscaling/suggestions
    78, msg: path=/admin/collections
    90, msg: ERROR
?   2,122, msg: path=/admin/cores
?   71,828, msg: path=/admin/metrics
  Lines in ZkStateReader
*   35, msg: Updated live nodes from ZooKeeper...
*   22,792, msg: A cluster state change: 
  Lines in RecoveryStrategy
    1, msg: seconds before trying to recover again
    4, msg: Replaying buffered documents.
    287, msg: ERROR
    289, msg: Updating version bucket highest from index after successful 
recovery.
    290, msg: PeerSync Recovery was not successful - trying replication.
    291, msg: Starting Replication Recovery.
    560, msg: currentVersions size=
    1,428, msg: startupVersions size=
    1,672, msg: Failed to connect leader 
    1,814, msg: Replaying updates buffered during PeerSync.
    2,101, msg: No replay needed.
    2,105, msg: Starting recovery process. recoveringAfterStartup=
    2,106, msg: Sending prep recovery command to
  Lines in QuerySenderListener
*   140,575, msg: QuerySenderListener sending requests to 
  Lines in DirectUpdateHandler2
    3, msg: WARN
*   1,983, msg: No uncommitted changes. Skipping IW.commit.
*   141,186, msg: start (note, probably mostly commit flush, but may be 
multiple)
*   141,187, msg: end_commit_flush
  Lines in SolrIndexSearcher
*    140,589, msg: Opening 
  Lines in SolrClientNodeStateProvider
    17,504, msg: WARN
    22,528, msg: Error on getting remote info, trying again:

{code}

> Visit Solr logging, it's too noisy.
> -----------------------------------
>
>                 Key: SOLR-11934
>                 URL: https://issues.apache.org/jira/browse/SOLR-11934
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Erick Erickson
>            Assignee: Erick Erickson
>            Priority: Major
>
> I think we have way too much INFO level logging. Or, perhaps more correctly, 
> Solr logging needs to be examined and messages logged at an appropriate level.
> We log every update at an INFO level for instance. But I think we log LIR at 
> INFO as well. As a sysadmin I don't care to have my logs polluted with a 
> message for every update, but if I'm trying to keep my system healthy I want 
> to see LIR messages and try to understand why.
> Plus, in large installations logging at INFO level is creating a _LOT_ of 
> files.
> What I want to discuss on this JIRA is
> 1> What kinds of messages do we want log at WARN, INFO, DEBUG, and TRACE 
> levels?
> 2> Who's the audience at each level? For a running system that's functioning, 
> sysops folks would really like WARN messages that mean something need 
> attention for instance. If I'm troubleshooting should I turn on INFO? DEBUG? 
> TRACE?
> So let's say we get some kind of agreement as to the above. Then I propose 
> three things
> 1> Someone (and probably me but all help gratefully accepted) needs to go 
> through our logging and assign appropriate levels. This will take quite a 
> while, I intend to work on it in small chunks.
> 2> Actually answer whether unnecessary objects are created when something 
> like log.info("whatever {}", someObjectOrMethodCall); is invoked. Is this 
> independent on the logging implementation used? The SLF4J and log4j seem a 
> bit contradictory.
> 3> Maybe regularize log, logger, LOG as variable names, but that's a nit.
> As a tactical approach, I suggest we tag each LoggerFactory.getLogger in 
> files we work on with //SOLR-(whatever number is assigned when I create 
> this). We can remove them all later, but since I expect to approach this 
> piecemeal it'd be nice to keep track of which files have been done already.
> Finally, I really really really don't want to do this all at once. There are 
> 5-6 thousand log messages. Even at 1,000 a week that's 6 weeks, even starting 
> now it would probably span the 7.3 release.
> This will probably be an umbrella issue so we can keep all the commits 
> straight and people can volunteer to "fix the files in core" as a separate 
> piece of work (hint).
> There are several existing JIRAs about logging in general, let's link them in 
> here as well.
> Let the discussion begin!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-11934) Visit Solr logging, it's too noisy.

Reply via email to