[ https://issues.apache.org/jira/browse/SOLR-14969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17223501#comment-17223501 ]
Andreas Hubold commented on SOLR-14969: --------------------------------------- For reference, I've attached my workaround (tested with Solr 8.6.3) in the form of a custom CoreAdminHandler subclass: [^CmCoreAdminHandler.java] It works similar to Erick's fix, but it can't fix the problem for async CREATE request. If anybody wants to use it, please change the package name, and register it in your solr.xml with {{<str name="adminHandler">...</str>}}. > Prevent creating multiple cores with the same name which leads to > instabilities (race condition) > ------------------------------------------------------------------------------------------------ > > Key: SOLR-14969 > URL: https://issues.apache.org/jira/browse/SOLR-14969 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: multicore > Affects Versions: 8.6, 8.6.3 > Reporter: Andreas Hubold > Assignee: Erick Erickson > Priority: Major > Attachments: CmCoreAdminHandler.java > > Time Spent: 1h 10m > Remaining Estimate: 0h > > CoreContainer#create does not correctly handle concurrent requests to create > the same core. There's a race condition (see also existing TODO comment in > the code), and CoreContainer#createFromDescriptor may be called subsequently > for the same core name. > The _second call_ then fails to create an IndexWriter, and exception handling > causes an inconsistent CoreContainer state. > {noformat} > 2020-10-27 00:29:25.350 ERROR (qtp2029754983-24) [ ] > o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: Error > CREATEing SolrCore 'blueprint_acgqqafsogyc_comments': Unable to create core > [blueprint_acgqqafsogyc_comments] Caused by: Lock held by this virtual > machine: /var/solr/data/blueprint_acgqqafsogyc_comments/data/index/write.lock > at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1312) > at > org.apache.solr.handler.admin.CoreAdminOperation.lambda$static$0(CoreAdminOperation.java:95) > at > org.apache.solr.handler.admin.CoreAdminOperation.execute(CoreAdminOperation.java:367) > ... > Caused by: org.apache.solr.common.SolrException: Unable to create core > [blueprint_acgqqafsogyc_comments] > at > org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1408) > at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1273) > ... 47 more > Caused by: org.apache.solr.common.SolrException: Error opening new searcher > at org.apache.solr.core.SolrCore.<init>(SolrCore.java:1071) > at org.apache.solr.core.SolrCore.<init>(SolrCore.java:906) > at > org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1387) > ... 48 more > Caused by: org.apache.solr.common.SolrException: Error opening new searcher > at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:2184) > at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:2308) > at org.apache.solr.core.SolrCore.initSearcher(SolrCore.java:1130) > at org.apache.solr.core.SolrCore.<init>(SolrCore.java:1012) > ... 50 more > Caused by: org.apache.lucene.store.LockObtainFailedException: Lock held by > this virtual machine: > /var/solr/data/blueprint_acgqqafsogyc_comments/data/index/write.lock > at > org.apache.lucene.store.NativeFSLockFactory.obtainFSLock(NativeFSLockFactory.java:139) > at > org.apache.lucene.store.FSLockFactory.obtainLock(FSLockFactory.java:41) > at > org.apache.lucene.store.BaseDirectory.obtainLock(BaseDirectory.java:45) > at > org.apache.lucene.store.FilterDirectory.obtainLock(FilterDirectory.java:105) > at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:785) > at > org.apache.solr.update.SolrIndexWriter.<init>(SolrIndexWriter.java:126) > at > org.apache.solr.update.SolrIndexWriter.create(SolrIndexWriter.java:100) > at > org.apache.solr.update.DefaultSolrCoreState.createMainIndexWriter(DefaultSolrCoreState.java:261) > at > org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:135) > at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:2145) > {noformat} > CoreContainer#createFromDescriptor removes the CoreDescriptor when handling > this exception. The SolrCore created for the first successful call is still > registered in SolrCores.cores, but now there's no corresponding > CoreDescriptor for that name anymore. > This inconsistency leads to subsequent NullPointerExceptions, for example > when using CoreAdmin STATUS with the core name: > CoreAdminOperation#getCoreStatus first gets the non-null SolrCore > (cores.getCore(cname)) but core.getInstancePath() throws an NPE, because the > CoreDescriptor is not registered anymore: > {noformat} > 2020-10-27 00:29:25.353 INFO (qtp2029754983-19) [ ] o.a.s.s.HttpSolrCall > [admin] webapp=null path=/admin/cores > params={core=blueprint_acgqqafsogyc_comments&action=STATUS&indexInfo=false&wt=javabin&version=2} > status=500 QTime=0 > 2020-10-27 00:29:25.353 ERROR (qtp2029754983-19) [ ] o.a.s.s.HttpSolrCall > null:org.apache.solr.common.SolrException: Error handling 'STATUS' action > at > org.apache.solr.handler.admin.CoreAdminOperation.execute(CoreAdminOperation.java:372) > at > org.apache.solr.handler.admin.CoreAdminHandler$CallInfo.call(CoreAdminHandler.java:397) > at > org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:181) > ... > Caused by: java.lang.NullPointerException > at org.apache.solr.core.SolrCore.getInstancePath(SolrCore.java:333) > at > org.apache.solr.handler.admin.CoreAdminOperation.getCoreStatus(CoreAdminOperation.java:329) > at org.apache.solr.handler.admin.StatusOp.execute(StatusOp.java:54) > at > org.apache.solr.handler.admin.CoreAdminOperation.execute(CoreAdminOperation.java:367) > {noformat} > STATUS keeps failing until Solr is restarted. > The NPE for CoreAdmin STATUS is a regression in 8.6. It seems to be caused by > https://github.com/apache/lucene-solr/commit/17ae79b0905b2bf8635c1b260b30807cae2f5463#diff-9652fe8353b7eff59cd6f128bb2699d88361e670b840ee5ca1018b1bc45584d1R324 -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org