[jira] [Commented] (SOLR-14923) Indexing performance is unacceptable when child documents are involved

David Smiley (Jira) Mon, 21 Dec 2020 14:06:05 -0800


    [ 
https://issues.apache.org/jira/browse/SOLR-14923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17253144#comment-17253144
 ]


David Smiley commented on SOLR-14923:
-------------------------------------

Finally, here's the PR: https://github.com/apache/lucene-solr/pull/2159
It includes some refactorings that made it easier for me to follow the logic.  
It was very difficult work overall!  At least it was a learning experience in 
the relationship between when/why realtime searchers are opened, and their 
relationship to the update log.  The hardest problem for me to solve was a 
repeatable assertion I trapped on/about the last line of 
{{org.apache.solr.cloud.NestedShardedAtomicUpdateTest#doRootShardRoutingTest}} 
in which inplace_updatable_int had the value of "5" instead of "1" for the 
first iteration because _somehow_ data from doNestedInplaceUpdateTest() was 
lingering.  Since this test uses a control client and executes indexing to 
multiple places, debugging it was a nightmare (the breakpoint of *which* core 
is this?) so I converted the test to a more normal SolrCloudTestCase, after 
which the bug didn't take me long.  Today was mostly 
cleanup/refactoring/commit-notes of everything.

Commit notes:
SOLR-14923: Nested docs indexing performance.

* UpdateLog.openRealtimeSearcher() was being called for every incoming document 
when the schema had _nest_path_.  It is now more limited to in-place-update of 
a child doc and in /get when a child doc is given, both of which are uncommon.
* Atomic/partial updates to nested documents should be faster. In-place updates 
of the same might be slower (needs to call openRealtimeSearcher).

Refactoring & minor bugs/improvements:
* Simplified AddUpdateCommand.getLuceneDoc & getLuceneDocsIfNested 
relationship, and DirectUpdateHandler2 which calls them.
* AddUpdateCommand.getIndexedId needed to be updated when _route_ was specified.
* NestedShardedAtomicUpdateTest no longer extends AbstractFullDistribZkTestBase 
because it wasn't really leveraging the "control client" checking, and it added 
too much complexity to debug failures.
* AtomicUpdateDocumentMerger: simplified merge; possibly now supports updates 
to anonymous children
* No longer need RTG.Resolution.DOC_WITH_CHILDREN

> Indexing performance is unacceptable when child documents are involved
> ----------------------------------------------------------------------
>
>                 Key: SOLR-14923
>                 URL: https://issues.apache.org/jira/browse/SOLR-14923
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: update, UpdateRequestProcessors
>    Affects Versions: 8.3, 8.4, 8.5, 8.6, 8.7, master (9.0)
>            Reporter: Thomas Wöckinger
>            Priority: Critical
>              Labels: performance, pull-request-available
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> Parallel indexing does not make sense at moment when child documents are used.
> The org.apache.solr.update.processor.DistributedUpdateProcessor checks at the 
> end of the method doVersionAdd if Ulog caches should be refreshed.
> This check will return true if any child document is included in the 
> AddUpdateCommand.
> If so ulog.openRealtimeSearcher(); is called, this call is very expensive, 
> and executed in a synchronized block of the UpdateLog instance, therefore all 
> other operations on the UpdateLog are blocked too.
> Because every important UpdateLog method (add, delete, ...) is done using a 
> synchronized block almost each operation is blocked.
> This reduces multi threaded index update to a single thread behavior.
> The described behavior is not depending on any option of the UpdateRequest, 
> so it does not make any difference if 'waitFlush', 'waitSearcher' or 
> 'softCommit'  is true or false.
> The described behavior makes the usage of ChildDocuments useless, because the 
> performance is unacceptable.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14923) Indexing performance is unacceptable when child documents are involved

Reply via email to