[jira] [Commented] (GEODE-2848) While destroying a LuceneIndex, the AsyncEventQueue region is destroyed in remote members before stopping the AsyncEventQueue

xiaojian zhou (JIRA) Mon, 01 May 2017 14:26:24 -0700

    [ 
https://issues.apache.org/jira/browse/GEODE-2848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15991583#comment-15991583
 ]


xiaojian zhou commented on GEODE-2848:
--------------------------------------

I think it does not worth to introduce complexity of new message or re-arrange 
the message processing sequence. 

But the regionToDispatchedKeysMap will be cleared and temp will be lost, so the 
secondary at remote site will not receive the ParallelQueueRemovalMessage. 

There's a conservative simple fix:
In getAllRecipients(), we need to find the region is gone and return empty set. 
When found recipients.isEmpty(), call regionToDispatchedKeysMap.putAll(temp)

> While destroying a LuceneIndex, the AsyncEventQueue region is destroyed in 
> remote members before stopping the AsyncEventQueue
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: GEODE-2848
>                 URL: https://issues.apache.org/jira/browse/GEODE-2848
>             Project: Geode
>          Issue Type: Bug
>          Components: lucene
>            Reporter: Barry Oglesby
>
> This causes a NullPointerException in BatchRemovalThread getAllRecipients 
> like:
> {noformat}
> [fine 2017/04/24 14:27:29.163 PDT gemfire4_r02-s28_3222 <BatchRemovalThread> 
> tid=0x6b] BatchRemovalThread: ignoring exception
> java.lang.NullPointerException
>   at 
> org.apache.geode.internal.cache.wan.parallel.ParallelGatewaySenderQueue$BatchRemovalThread.getAllRecipients(ParallelGatewaySenderQueue.java:1776)
>   at 
> org.apache.geode.internal.cache.wan.parallel.ParallelGatewaySenderQueue$BatchRemovalThread.run(ParallelGatewaySenderQueue.java:1722)
> {noformat}
> This message is currently only logged at fine level and doesn't cause any 
> real issues.
> The simple fix is to check for null in getAllRecipients like:
> {noformat}
> PartitionedRegion pReg = ((PartitionedRegion) (cache.getRegion((String) pr)));
> if (pReg != null) {
>   recipients.addAll(pReg.getRegionAdvisor().adviseDataStore());
> }
> {noformat}
> Another more complex fix is to change the destroyIndex sequence.
> The current destroyIndex sequence is:
> # stops and destroys the AEQ in the initiator (including the underlying PR)
> # closes the repository manager in the initiator
> # stops and destroys the AEQ in remote members (not including the underlying 
> PR)
> # closes the repository manager in the remote members
> # destroys the fileAndChunk region in the initiator
> Between steps 1 and 3, the region will be null in the remote members, so the 
> NPE can occur.
> A better sequence would be:
> # stops the AEQ in the initiator
> # stops the AEQ in remote members
> # closes the repository manager in the initiator
> # closes the repository manager in the remote members
> # destroys the AEQ in the initiator (including the underlying PR) 
> # destroys the AEQ in the remote members (not including the underlying PR)
> # destroys the fileAndChunk region in the initiator
> That would be 3 messages between the members.
> I think that can be combined into one remote message like:
> # stops the AEQ in the initiator
> # closes the repository manager in the initiator
> # stops the AEQ in remote members
> # closes the repository manager in the remote members
> # destroys the AEQ in the remote members (not including the underlying PR)
> # destroys the AEQ in the initiator (including the underlying PR) 
> # destroys the fileAndChunk region in the initiator



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (GEODE-2848) While destroying a LuceneIndex, the AsyncEventQueue region is destroyed in remote members before stopping the AsyncEventQueue

Reply via email to