[
https://issues.apache.org/jira/browse/SOLR-14923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17254118#comment-17254118
]
Thomas Wöckinger commented on SOLR-14923:
-----------------------------------------
I run the indexing tests five times, one run took about 60mins +-2mins.
Compared to the the first version it is about 3-4mins better, which is about 5%
Lock contention on UpdateLog is also a bit better.
Another very interesting behavior shown up: In 5 minutes there are about 12000
java.io.FileNotFoundException and about 2000 java.nio.NoSuchFileException
thrown. The first is thrown from RAMDirectory.fileLength the second from
FSDirectory.fileLength. They are booth used from NRTCachingDirectory.
T implementation is different between master and 8.x but *createTempOutput* is
still using the method *slowFileExists* which is using exception handling to
detect if a file exists, which can be avoided most of the time in the existing
implementations. I'm not sure if Solr uses -_XX:-StackTraceInThrowable_, but if
not these calls can be a hundred times faster. But this seems to be lucene
related. May i open different issue for this.
[~dsmiley] So from my side this looks very good!
> Indexing performance is unacceptable when child documents are involved
> ----------------------------------------------------------------------
>
> Key: SOLR-14923
> URL: https://issues.apache.org/jira/browse/SOLR-14923
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Components: update, UpdateRequestProcessors
> Affects Versions: 8.3, 8.4, 8.5, 8.6, 8.7, master (9.0)
> Reporter: Thomas Wöckinger
> Priority: Critical
> Labels: performance, pull-request-available
> Time Spent: 1h
> Remaining Estimate: 0h
>
> Parallel indexing does not make sense at moment when child documents are used.
> The org.apache.solr.update.processor.DistributedUpdateProcessor checks at the
> end of the method doVersionAdd if Ulog caches should be refreshed.
> This check will return true if any child document is included in the
> AddUpdateCommand.
> If so ulog.openRealtimeSearcher(); is called, this call is very expensive,
> and executed in a synchronized block of the UpdateLog instance, therefore all
> other operations on the UpdateLog are blocked too.
> Because every important UpdateLog method (add, delete, ...) is done using a
> synchronized block almost each operation is blocked.
> This reduces multi threaded index update to a single thread behavior.
> The described behavior is not depending on any option of the UpdateRequest,
> so it does not make any difference if 'waitFlush', 'waitSearcher' or
> 'softCommit' is true or false.
> The described behavior makes the usage of ChildDocuments useless, because the
> performance is unacceptable.
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]