[ https://issues.apache.org/jira/browse/SOLR-14923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17253144#comment-17253144 ]
David Smiley commented on SOLR-14923: ------------------------------------- Finally, here's the PR: https://github.com/apache/lucene-solr/pull/2159 It includes some refactorings that made it easier for me to follow the logic. It was very difficult work overall! At least it was a learning experience in the relationship between when/why realtime searchers are opened, and their relationship to the update log. The hardest problem for me to solve was a repeatable assertion I trapped on/about the last line of {{org.apache.solr.cloud.NestedShardedAtomicUpdateTest#doRootShardRoutingTest}} in which inplace_updatable_int had the value of "5" instead of "1" for the first iteration because _somehow_ data from doNestedInplaceUpdateTest() was lingering. Since this test uses a control client and executes indexing to multiple places, debugging it was a nightmare (the breakpoint of *which* core is this?) so I converted the test to a more normal SolrCloudTestCase, after which the bug didn't take me long. Today was mostly cleanup/refactoring/commit-notes of everything. Commit notes: SOLR-14923: Nested docs indexing performance. * UpdateLog.openRealtimeSearcher() was being called for every incoming document when the schema had _nest_path_. It is now more limited to in-place-update of a child doc and in /get when a child doc is given, both of which are uncommon. * Atomic/partial updates to nested documents should be faster. In-place updates of the same might be slower (needs to call openRealtimeSearcher). Refactoring & minor bugs/improvements: * Simplified AddUpdateCommand.getLuceneDoc & getLuceneDocsIfNested relationship, and DirectUpdateHandler2 which calls them. * AddUpdateCommand.getIndexedId needed to be updated when _route_ was specified. * NestedShardedAtomicUpdateTest no longer extends AbstractFullDistribZkTestBase because it wasn't really leveraging the "control client" checking, and it added too much complexity to debug failures. * AtomicUpdateDocumentMerger: simplified merge; possibly now supports updates to anonymous children * No longer need RTG.Resolution.DOC_WITH_CHILDREN > Indexing performance is unacceptable when child documents are involved > ---------------------------------------------------------------------- > > Key: SOLR-14923 > URL: https://issues.apache.org/jira/browse/SOLR-14923 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: update, UpdateRequestProcessors > Affects Versions: 8.3, 8.4, 8.5, 8.6, 8.7, master (9.0) > Reporter: Thomas Wöckinger > Priority: Critical > Labels: performance, pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > Parallel indexing does not make sense at moment when child documents are used. > The org.apache.solr.update.processor.DistributedUpdateProcessor checks at the > end of the method doVersionAdd if Ulog caches should be refreshed. > This check will return true if any child document is included in the > AddUpdateCommand. > If so ulog.openRealtimeSearcher(); is called, this call is very expensive, > and executed in a synchronized block of the UpdateLog instance, therefore all > other operations on the UpdateLog are blocked too. > Because every important UpdateLog method (add, delete, ...) is done using a > synchronized block almost each operation is blocked. > This reduces multi threaded index update to a single thread behavior. > The described behavior is not depending on any option of the UpdateRequest, > so it does not make any difference if 'waitFlush', 'waitSearcher' or > 'softCommit' is true or false. > The described behavior makes the usage of ChildDocuments useless, because the > performance is unacceptable. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org