[jira] [Commented] (SOLR-14718) Multiple flaws in tracking which UpdateCommand is associated with a given failure logged by ErrorReportingConcurrentUpdateSolrClient: "cmd=add{,id=(null)}"

Chris M. Hostetter (Jira) Thu, 06 Aug 2020 15:53:13 -0700


    [ 
https://issues.apache.org/jira/browse/SOLR-14718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17172705#comment-17172705
 ]


Chris M. Hostetter commented on SOLR-14718:
-------------------------------------------

 
 # "{{add\{,id=(null)}}}" is what you get from an 
{{AddUpdateCommand.toString()}} if either:
 ** the document it contains has no uniqueKey (ie: not in use by the schema, 
not yet filled in by some custom processor, etc...) ... this situation is 
pretty rare in practice
 ** the document itself is null
 # {{JavabinLoader.parseAndLoadDocs}} takes a "re-use" approach with 
{{AddUpdateCommand}} ...
 ** it initializes a single {{AddUpdateCommand addCmd}} for the whole request
 ** it calls {{addCmd.solrDoc = document; ... processor.processAdd(addCmd); 
addCmd.clear();}} for each document
 # {{DistributedUpdateProcessor}} uses {{SolrCmdDistributor}} uses 
{{StreamingSolrClients}} to create & asynchornously process a {{Req}} for each 
of the individual _documents_
 ** but along the way it keeps a reference to the (Add) {{UpdateCommand}} that 
document came from, evidently in order to log info about it during error 
handling
 ** which is useless once {{JavabinLoader}} has nulled out the details of the 
{{AddUpdateCommand}}
 *** IIRC there's code to "clone" the {{SolrInputDocument}} for local 
processing so we don't accidentally modify it in update processors while it's 
'in flight' for async remote updates, but in this case it's the 
{{UpdateCommand}} that's getting modified, and i guess nothing clones that?
 # BUT! ... even if we "fix" this {{AddUpdateCommand}} re-use (or clone the 
entire {{UpdateCommand}}, not just the SolrInputDocument) there appears to be 
another problem (which i think already affects things like the XML/JSON loaders 
when indexing multiple documents per request?
 ** the way the {{Req}} (and thus {{UpdateCommand}} ) is tracked for use 
if/when there is an error is as a member variable on the 
{{ErrorReportingConcurrentUpdateSolrClient}} 
{{StreamingSolrClients.getSolrClient(Req)}} initializes
 ** Except...
 ** {{StreamingSolrClients}} maintains a {{Map<String, 
ConcurrentUpdateHttp2SolrClient> solrClients}} "cache" of solr clients key'ed 
off of the {{Req}} objects {{req.node.getUrl()}}
 *** So if a single {{SolrQueryRequest}} includes a batch of multiple documents 
destined for the same node (shard? leader?) then (AFAICT) any document which 
has a failure is going to be reported as the _first_ document in that batch
 *** ie: instead of "{{add\{,id=(null)}}}" you might get a failure for 
"{{add\{,id=xxx}}}" even though 'xxx' may have been indexed just fine, but doc 
'yyy' (which lives in the same shard as 'xxx') may have been the "add" command 
that really failed

> Multiple flaws in tracking which UpdateCommand is associated with a given 
> failure logged by ErrorReportingConcurrentUpdateSolrClient: 
> "cmd=add{,id=(null)}"
> -----------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-14718
>                 URL: https://issues.apache.org/jira/browse/SOLR-14718
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Chris M. Hostetter
>            Priority: Major
>
> Here's an example, taken from SOLR-13486, of an ERROR logged by 
> {{ErrorReportingConcurrentUpdateSolrClient}} when a distrubted update failure 
> occured...
> {noformat}
>    [junit4]   2> 1704143 ERROR 
> (updateExecutor-6525-thread-1-processing-x:outOfSyncReplicasCannotBecomeLeader-false_shard1_replica_n1
>  r:core_node2 null n:127.0.0.1:34940_solr 
> c:outOfSyncReplicasCannotBecomeLeader-false s:shard1) [n:127.0.0.1:34940_solr 
> c:outOfSyncReplicasCannotBecomeLeader-false s:shard1 r:core_node2 
> x:outOfSyncReplicasCannotBecomeLeader-false_shard1_replica_n1] 
> o.a.s.u.ErrorReportingConcurrentUpdateSolrClient Error when calling 
> SolrCmdDistributor$Req: cmd=add{,id=(null)}; node=StdNode: 
> http://127.0.0.1:40376/solr/outOfSyncReplicasCannotBecomeLeader-false_shard1_replica_n5/
>  to 
> http://127.0.0.1:40376/solr/outOfSyncReplicasCannotBecomeLeader-false_shard1_replica_n5/
>    [junit4]   2>           => java.io.IOException: java.net.ConnectException: 
> Connection refused
> {noformat}
> In this case the the underlying cause was a ConnectException - but the same 
> ERROR msg format is used regardless of the underlying Exception that was 
> thrown - and it's the result of these two bits of code...
> {code:java}
> // ErrorReportingConcurrentUpdateSolrClient.handleError
>     log.error("Error when calling {} to {}", req, req.node.getUrl(), ex);
> // Req.toString()...
>     public String toString() {
>       StringBuilder sb = new StringBuilder();
>       sb.append("SolrCmdDistributor$Req: cmd=").append(cmd.toString());
>       sb.append("; node=").append(String.valueOf(node));
>       return sb.toString();
>     }
> {code}
> I was recently asked why the {{UpdateCommand cmd}} reported by the 
> {{Req.toString()}} was *ALWAYS* showing up as {{add\{,id=(null)};}} (ie: an 
> "empty" {{AddUpdateCommand}} ) instead of correctly identifying which 
> document was failing.
> In the above case of a "ConnectionException" this may not matter, but the 
> same problem exists if an individual document has problem, perhaps due to 
> schema conflictss detected by the leader when some other node forwards 
> TOLEADER.
> Based on an audit of the code, there appears to be at least 2 diff bugs in 
> Solr that can cause the "cmd" reported in these error situations to be wrong:
>  * UpdateCommand re-use in JavabinLoader
>  * ErrorReportingConcurrentUpdateSolrClient in StreamingSolrClients
> ...full notes to follow in comment.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14718) Multiple flaws in tracking which UpdateCommand is associated with a given failure logged by ErrorReportingConcurrentUpdateSolrClient: "cmd=add{,id=(null)}"

Reply via email to