[ 
https://issues.apache.org/jira/browse/SOLR-14923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17259437#comment-17259437
 ] 

David Smiley commented on SOLR-14923:
-------------------------------------

To be clear, the PR:
 * Atomic/partial updates to nested docs no longer require that \_root_ field 
have stored/docValues.
 * Atomic/partial updates to nested docs no longer require the \_nest_path_ 
field
 * Performance improvements for nested docs, both for normal indexing (as 
reported by Thomas), and should as well as for atomic/partial updates (not 
measured).  The "reopen" of a realtime searcher should happen much more rarely.
 * And some general improvements / refactorings, especially in 
RealTimeGetComponent.

The bad:

When sending an atomic/partial update for a child doc, it is now required to 
specify the root doc ID, instead of letting Solr try to figure it out.  Ideally 
you pass \_root_ in the doc.  At least you'll get an exception typically, so 
you can adjust.  I also added a fallback that looks at the \_route_ parameter, 
if specified, with the intention of putting that only in 8x.  I added that 
because I suspect current users of this feature may be using \_route_ as the 
same as the root ID, but that is of course not necessarily a valid assumption.  
Much of the internal tests treat them as equivalent, so this was a motivating 
factor for me to add this hack.  The ref guide doesn't say they are the same; 
it mostly steers users to use routed keys instead (exclamation syntax).  It's 
quite possible someone today is using this feature and is using a \_route_ 
param that is _not_ equivalent to the root ID (perhaps an implicitly routed 
collection?).  I also think this feature is a little exotic, so maybe it's 
nobody.  Shrug; I have no strong opinion here – I could happily remove the hack 
from the PR.  Either way, if you mess up, you'll typically get an exception – 
no surprises.  I added SOLR-15064 (originally worded wrong but is now 
corrected).

> Indexing performance is unacceptable when child documents are involved
> ----------------------------------------------------------------------
>
>                 Key: SOLR-14923
>                 URL: https://issues.apache.org/jira/browse/SOLR-14923
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: update, UpdateRequestProcessors
>    Affects Versions: 8.3, 8.4, 8.5, 8.6, 8.7, master (9.0)
>            Reporter: Thomas Wöckinger
>            Priority: Critical
>              Labels: performance, pull-request-available
>          Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Parallel indexing does not make sense at moment when child documents are used.
> The org.apache.solr.update.processor.DistributedUpdateProcessor checks at the 
> end of the method doVersionAdd if Ulog caches should be refreshed.
> This check will return true if any child document is included in the 
> AddUpdateCommand.
> If so ulog.openRealtimeSearcher(); is called, this call is very expensive, 
> and executed in a synchronized block of the UpdateLog instance, therefore all 
> other operations on the UpdateLog are blocked too.
> Because every important UpdateLog method (add, delete, ...) is done using a 
> synchronized block almost each operation is blocked.
> This reduces multi threaded index update to a single thread behavior.
> The described behavior is not depending on any option of the UpdateRequest, 
> so it does not make any difference if 'waitFlush', 'waitSearcher' or 
> 'softCommit'  is true or false.
> The described behavior makes the usage of ChildDocuments useless, because the 
> performance is unacceptable.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to