[ https://issues.apache.org/jira/browse/SOLR-14718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17172705#comment-17172705 ]
Chris M. Hostetter commented on SOLR-14718: ------------------------------------------- # "{{add\{,id=(null)}}}" is what you get from an {{AddUpdateCommand.toString()}} if either: ** the document it contains has no uniqueKey (ie: not in use by the schema, not yet filled in by some custom processor, etc...) ... this situation is pretty rare in practice ** the document itself is null # {{JavabinLoader.parseAndLoadDocs}} takes a "re-use" approach with {{AddUpdateCommand}} ... ** it initializes a single {{AddUpdateCommand addCmd}} for the whole request ** it calls {{addCmd.solrDoc = document; ... processor.processAdd(addCmd); addCmd.clear();}} for each document # {{DistributedUpdateProcessor}} uses {{SolrCmdDistributor}} uses {{StreamingSolrClients}} to create & asynchornously process a {{Req}} for each of the individual _documents_ ** but along the way it keeps a reference to the (Add) {{UpdateCommand}} that document came from, evidently in order to log info about it during error handling ** which is useless once {{JavabinLoader}} has nulled out the details of the {{AddUpdateCommand}} *** IIRC there's code to "clone" the {{SolrInputDocument}} for local processing so we don't accidentally modify it in update processors while it's 'in flight' for async remote updates, but in this case it's the {{UpdateCommand}} that's getting modified, and i guess nothing clones that? # BUT! ... even if we "fix" this {{AddUpdateCommand}} re-use (or clone the entire {{UpdateCommand}}, not just the SolrInputDocument) there appears to be another problem (which i think already affects things like the XML/JSON loaders when indexing multiple documents per request? ** the way the {{Req}} (and thus {{UpdateCommand}} ) is tracked for use if/when there is an error is as a member variable on the {{ErrorReportingConcurrentUpdateSolrClient}} {{StreamingSolrClients.getSolrClient(Req)}} initializes ** Except... ** {{StreamingSolrClients}} maintains a {{Map<String, ConcurrentUpdateHttp2SolrClient> solrClients}} "cache" of solr clients key'ed off of the {{Req}} objects {{req.node.getUrl()}} *** So if a single {{SolrQueryRequest}} includes a batch of multiple documents destined for the same node (shard? leader?) then (AFAICT) any document which has a failure is going to be reported as the _first_ document in that batch *** ie: instead of "{{add\{,id=(null)}}}" you might get a failure for "{{add\{,id=xxx}}}" even though 'xxx' may have been indexed just fine, but doc 'yyy' (which lives in the same shard as 'xxx') may have been the "add" command that really failed > Multiple flaws in tracking which UpdateCommand is associated with a given > failure logged by ErrorReportingConcurrentUpdateSolrClient: > "cmd=add{,id=(null)}" > ----------------------------------------------------------------------------------------------------------------------------------------------------------- > > Key: SOLR-14718 > URL: https://issues.apache.org/jira/browse/SOLR-14718 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Reporter: Chris M. Hostetter > Priority: Major > > Here's an example, taken from SOLR-13486, of an ERROR logged by > {{ErrorReportingConcurrentUpdateSolrClient}} when a distrubted update failure > occured... > {noformat} > [junit4] 2> 1704143 ERROR > (updateExecutor-6525-thread-1-processing-x:outOfSyncReplicasCannotBecomeLeader-false_shard1_replica_n1 > r:core_node2 null n:127.0.0.1:34940_solr > c:outOfSyncReplicasCannotBecomeLeader-false s:shard1) [n:127.0.0.1:34940_solr > c:outOfSyncReplicasCannotBecomeLeader-false s:shard1 r:core_node2 > x:outOfSyncReplicasCannotBecomeLeader-false_shard1_replica_n1] > o.a.s.u.ErrorReportingConcurrentUpdateSolrClient Error when calling > SolrCmdDistributor$Req: cmd=add{,id=(null)}; node=StdNode: > http://127.0.0.1:40376/solr/outOfSyncReplicasCannotBecomeLeader-false_shard1_replica_n5/ > to > http://127.0.0.1:40376/solr/outOfSyncReplicasCannotBecomeLeader-false_shard1_replica_n5/ > [junit4] 2> => java.io.IOException: java.net.ConnectException: > Connection refused > {noformat} > In this case the the underlying cause was a ConnectException - but the same > ERROR msg format is used regardless of the underlying Exception that was > thrown - and it's the result of these two bits of code... > {code:java} > // ErrorReportingConcurrentUpdateSolrClient.handleError > log.error("Error when calling {} to {}", req, req.node.getUrl(), ex); > // Req.toString()... > public String toString() { > StringBuilder sb = new StringBuilder(); > sb.append("SolrCmdDistributor$Req: cmd=").append(cmd.toString()); > sb.append("; node=").append(String.valueOf(node)); > return sb.toString(); > } > {code} > I was recently asked why the {{UpdateCommand cmd}} reported by the > {{Req.toString()}} was *ALWAYS* showing up as {{add\{,id=(null)};}} (ie: an > "empty" {{AddUpdateCommand}} ) instead of correctly identifying which > document was failing. > In the above case of a "ConnectionException" this may not matter, but the > same problem exists if an individual document has problem, perhaps due to > schema conflictss detected by the leader when some other node forwards > TOLEADER. > Based on an audit of the code, there appears to be at least 2 diff bugs in > Solr that can cause the "cmd" reported in these error situations to be wrong: > * UpdateCommand re-use in JavabinLoader > * ErrorReportingConcurrentUpdateSolrClient in StreamingSolrClients > ...full notes to follow in comment. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org