[ https://issues.apache.org/jira/browse/GEODE-2848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15991583#comment-15991583 ]
xiaojian zhou commented on GEODE-2848: -------------------------------------- I think it does not worth to introduce complexity of new message or re-arrange the message processing sequence. But the regionToDispatchedKeysMap will be cleared and temp will be lost, so the secondary at remote site will not receive the ParallelQueueRemovalMessage. There's a conservative simple fix: In getAllRecipients(), we need to find the region is gone and return empty set. When found recipients.isEmpty(), call regionToDispatchedKeysMap.putAll(temp) > While destroying a LuceneIndex, the AsyncEventQueue region is destroyed in > remote members before stopping the AsyncEventQueue > ----------------------------------------------------------------------------------------------------------------------------- > > Key: GEODE-2848 > URL: https://issues.apache.org/jira/browse/GEODE-2848 > Project: Geode > Issue Type: Bug > Components: lucene > Reporter: Barry Oglesby > > This causes a NullPointerException in BatchRemovalThread getAllRecipients > like: > {noformat} > [fine 2017/04/24 14:27:29.163 PDT gemfire4_r02-s28_3222 <BatchRemovalThread> > tid=0x6b] BatchRemovalThread: ignoring exception > java.lang.NullPointerException > at > org.apache.geode.internal.cache.wan.parallel.ParallelGatewaySenderQueue$BatchRemovalThread.getAllRecipients(ParallelGatewaySenderQueue.java:1776) > at > org.apache.geode.internal.cache.wan.parallel.ParallelGatewaySenderQueue$BatchRemovalThread.run(ParallelGatewaySenderQueue.java:1722) > {noformat} > This message is currently only logged at fine level and doesn't cause any > real issues. > The simple fix is to check for null in getAllRecipients like: > {noformat} > PartitionedRegion pReg = ((PartitionedRegion) (cache.getRegion((String) pr))); > if (pReg != null) { > recipients.addAll(pReg.getRegionAdvisor().adviseDataStore()); > } > {noformat} > Another more complex fix is to change the destroyIndex sequence. > The current destroyIndex sequence is: > # stops and destroys the AEQ in the initiator (including the underlying PR) > # closes the repository manager in the initiator > # stops and destroys the AEQ in remote members (not including the underlying > PR) > # closes the repository manager in the remote members > # destroys the fileAndChunk region in the initiator > Between steps 1 and 3, the region will be null in the remote members, so the > NPE can occur. > A better sequence would be: > # stops the AEQ in the initiator > # stops the AEQ in remote members > # closes the repository manager in the initiator > # closes the repository manager in the remote members > # destroys the AEQ in the initiator (including the underlying PR) > # destroys the AEQ in the remote members (not including the underlying PR) > # destroys the fileAndChunk region in the initiator > That would be 3 messages between the members. > I think that can be combined into one remote message like: > # stops the AEQ in the initiator > # closes the repository manager in the initiator > # stops the AEQ in remote members > # closes the repository manager in the remote members > # destroys the AEQ in the remote members (not including the underlying PR) > # destroys the AEQ in the initiator (including the underlying PR) > # destroys the fileAndChunk region in the initiator -- This message was sent by Atlassian JIRA (v6.3.15#6346)