[ 
https://issues.apache.org/jira/browse/HADOOP-12189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated HADOOP-12189:
-------------------------------
    Description: 
Improve CallQueueManager#swapQueue to make queue elements drop nearly 
impossible. This is the trade-off between performance and functionality, even 
in the really really rare situation, we may drop one element, but it is not the 
end of the world since the client may still recover with timeout.
CallQueueManager may drop elements from the queue sometimes when calling 
{{swapQueue}}. 
The following test failure from TestCallQueueManager shown some elements in the 
queue are dropped.
https://builds.apache.org/job/PreCommit-HADOOP-Build/7150/testReport/org.apache.hadoop.ipc/TestCallQueueManager/testSwapUnderContention/
{code}
java.lang.AssertionError: expected:<27241> but was:<27245>
        at org.junit.Assert.fail(Assert.java:88)
        at org.junit.Assert.failNotEquals(Assert.java:743)
        at org.junit.Assert.assertEquals(Assert.java:118)
        at org.junit.Assert.assertEquals(Assert.java:555)
        at org.junit.Assert.assertEquals(Assert.java:542)
        at 
org.apache.hadoop.ipc.TestCallQueueManager.testSwapUnderContention(TestCallQueueManager.java:220)
{code}
It looked like the elements in the queue are dropped due to 
{{CallQueueManager#swapQueue}}
Looked at the implementation of {{CallQueueManager#swapQueue}}, there is a 
possibility that the elements in the queue are dropped. If the queue is full, 
the calling thread for {{CallQueueManager#put}} is blocked for long time. It 
may put the element into the old queue after queue in {{takeRef}} is changed by 
swapQueue, then this element in the old queue will be dropped.


  was:
CallQueueManager may drop elements from the queue sometimes when calling 
{{swapQueue}}. 
The following test failure from TestCallQueueManager shown some elements in the 
queue are dropped.
https://builds.apache.org/job/PreCommit-HADOOP-Build/7150/testReport/org.apache.hadoop.ipc/TestCallQueueManager/testSwapUnderContention/
{code}
java.lang.AssertionError: expected:<27241> but was:<27245>
        at org.junit.Assert.fail(Assert.java:88)
        at org.junit.Assert.failNotEquals(Assert.java:743)
        at org.junit.Assert.assertEquals(Assert.java:118)
        at org.junit.Assert.assertEquals(Assert.java:555)
        at org.junit.Assert.assertEquals(Assert.java:542)
        at 
org.apache.hadoop.ipc.TestCallQueueManager.testSwapUnderContention(TestCallQueueManager.java:220)
{code}
It looked like the elements in the queue are dropped due to 
{{CallQueueManager#swapQueue}}
Looked at the implementation of {{CallQueueManager#swapQueue}}, there is a 
possibility that the elements in the queue are dropped. If the queue is full, 
the calling thread for {{CallQueueManager#put}} is blocked for long time. It 
may put the element into the old queue after queue in {{takeRef}} is changed by 
swapQueue, then this element in the old queue will be dropped.

        Summary: Improve CallQueueManager#swapQueue to make queue elements drop 
nearly impossible.  (was: improve CallQueueManager#swapQueue to make queue 
elements drop nearly impossible.)

> Improve CallQueueManager#swapQueue to make queue elements drop nearly 
> impossible.
> ---------------------------------------------------------------------------------
>
>                 Key: HADOOP-12189
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12189
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: ipc, test
>    Affects Versions: 2.7.1
>            Reporter: zhihai xu
>            Assignee: zhihai xu
>         Attachments: HADOOP-12189.000.patch, HADOOP-12189.001.patch, 
> HADOOP-12189.none_guarantee.000.patch, HADOOP-12189.none_guarantee.001.patch, 
> HADOOP-12189.none_guarantee.002.patch
>
>
> Improve CallQueueManager#swapQueue to make queue elements drop nearly 
> impossible. This is the trade-off between performance and functionality, even 
> in the really really rare situation, we may drop one element, but it is not 
> the end of the world since the client may still recover with timeout.
> CallQueueManager may drop elements from the queue sometimes when calling 
> {{swapQueue}}. 
> The following test failure from TestCallQueueManager shown some elements in 
> the queue are dropped.
> https://builds.apache.org/job/PreCommit-HADOOP-Build/7150/testReport/org.apache.hadoop.ipc/TestCallQueueManager/testSwapUnderContention/
> {code}
> java.lang.AssertionError: expected:<27241> but was:<27245>
>       at org.junit.Assert.fail(Assert.java:88)
>       at org.junit.Assert.failNotEquals(Assert.java:743)
>       at org.junit.Assert.assertEquals(Assert.java:118)
>       at org.junit.Assert.assertEquals(Assert.java:555)
>       at org.junit.Assert.assertEquals(Assert.java:542)
>       at 
> org.apache.hadoop.ipc.TestCallQueueManager.testSwapUnderContention(TestCallQueueManager.java:220)
> {code}
> It looked like the elements in the queue are dropped due to 
> {{CallQueueManager#swapQueue}}
> Looked at the implementation of {{CallQueueManager#swapQueue}}, there is a 
> possibility that the elements in the queue are dropped. If the queue is full, 
> the calling thread for {{CallQueueManager#put}} is blocked for long time. It 
> may put the element into the old queue after queue in {{takeRef}} is changed 
> by swapQueue, then this element in the old queue will be dropped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to