[ https://issues.apache.org/jira/browse/SOLR-14923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17259437#comment-17259437 ]
David Smiley commented on SOLR-14923: ------------------------------------- To be clear, the PR: * Atomic/partial updates to nested docs no longer require that \_root_ field have stored/docValues. * Atomic/partial updates to nested docs no longer require the \_nest_path_ field * Performance improvements for nested docs, both for normal indexing (as reported by Thomas), and should as well as for atomic/partial updates (not measured). The "reopen" of a realtime searcher should happen much more rarely. * And some general improvements / refactorings, especially in RealTimeGetComponent. The bad: When sending an atomic/partial update for a child doc, it is now required to specify the root doc ID, instead of letting Solr try to figure it out. Ideally you pass \_root_ in the doc. At least you'll get an exception typically, so you can adjust. I also added a fallback that looks at the \_route_ parameter, if specified, with the intention of putting that only in 8x. I added that because I suspect current users of this feature may be using \_route_ as the same as the root ID, but that is of course not necessarily a valid assumption. Much of the internal tests treat them as equivalent, so this was a motivating factor for me to add this hack. The ref guide doesn't say they are the same; it mostly steers users to use routed keys instead (exclamation syntax). It's quite possible someone today is using this feature and is using a \_route_ param that is _not_ equivalent to the root ID (perhaps an implicitly routed collection?). I also think this feature is a little exotic, so maybe it's nobody. Shrug; I have no strong opinion here – I could happily remove the hack from the PR. Either way, if you mess up, you'll typically get an exception – no surprises. I added SOLR-15064 (originally worded wrong but is now corrected). > Indexing performance is unacceptable when child documents are involved > ---------------------------------------------------------------------- > > Key: SOLR-14923 > URL: https://issues.apache.org/jira/browse/SOLR-14923 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: update, UpdateRequestProcessors > Affects Versions: 8.3, 8.4, 8.5, 8.6, 8.7, master (9.0) > Reporter: Thomas Wöckinger > Priority: Critical > Labels: performance, pull-request-available > Time Spent: 1h 40m > Remaining Estimate: 0h > > Parallel indexing does not make sense at moment when child documents are used. > The org.apache.solr.update.processor.DistributedUpdateProcessor checks at the > end of the method doVersionAdd if Ulog caches should be refreshed. > This check will return true if any child document is included in the > AddUpdateCommand. > If so ulog.openRealtimeSearcher(); is called, this call is very expensive, > and executed in a synchronized block of the UpdateLog instance, therefore all > other operations on the UpdateLog are blocked too. > Because every important UpdateLog method (add, delete, ...) is done using a > synchronized block almost each operation is blocked. > This reduces multi threaded index update to a single thread behavior. > The described behavior is not depending on any option of the UpdateRequest, > so it does not make any difference if 'waitFlush', 'waitSearcher' or > 'softCommit' is true or false. > The described behavior makes the usage of ChildDocuments useless, because the > performance is unacceptable. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org