[ https://issues.apache.org/jira/browse/SOLR-14923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17254315#comment-17254315 ]
David Smiley commented on SOLR-14923: ------------------------------------- Today I made further modifications to NestedShardedAtomicUpdateTest so that it didn't always call /get to make some assertions on the results -- I guarded this with a random boolean. I'm aware that /get can _sometimes_ trigger a new realtime searcher, and so I wanted to test that the side effect of this wasn't masking bugs. It did find a bug :-(. And I tracked it down; it's limited to in-place-updates of a child doc because these updates are added to the update log ID'ed by the document itself, not the root ID it is a part of. Consequently, a /get later of the root ID asking for child docs (fl=*,[child]) can slip past the updateLog (because the ulog has no updates of this root ID) and the _existing_ realtimeSearcher will be used, returning a stale value of the updated child document. This is despite {{mustUseRealtimeSearcher==true}} because that only matters if a doc is found in the updateLog. RTG goes through some great lengths (with added complexity thereof) to avoid calling {{ulog.openRealtimeSearcher}}. I've tried some ideas and I'll think more on a fix for this one. > Indexing performance is unacceptable when child documents are involved > ---------------------------------------------------------------------- > > Key: SOLR-14923 > URL: https://issues.apache.org/jira/browse/SOLR-14923 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: update, UpdateRequestProcessors > Affects Versions: 8.3, 8.4, 8.5, 8.6, 8.7, master (9.0) > Reporter: Thomas Wöckinger > Priority: Critical > Labels: performance, pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > Parallel indexing does not make sense at moment when child documents are used. > The org.apache.solr.update.processor.DistributedUpdateProcessor checks at the > end of the method doVersionAdd if Ulog caches should be refreshed. > This check will return true if any child document is included in the > AddUpdateCommand. > If so ulog.openRealtimeSearcher(); is called, this call is very expensive, > and executed in a synchronized block of the UpdateLog instance, therefore all > other operations on the UpdateLog are blocked too. > Because every important UpdateLog method (add, delete, ...) is done using a > synchronized block almost each operation is blocked. > This reduces multi threaded index update to a single thread behavior. > The described behavior is not depending on any option of the UpdateRequest, > so it does not make any difference if 'waitFlush', 'waitSearcher' or > 'softCommit' is true or false. > The described behavior makes the usage of ChildDocuments useless, because the > performance is unacceptable. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org