[ 
https://issues.apache.org/jira/browse/HADOOP-11957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550941#comment-14550941
 ] 

Colin Patrick McCabe commented on HADOOP-11957:
-----------------------------------------------

Thanks for looking at this, [~anu].  The patch is not correct because right now 
it unconditionally breaks out of the loop without waiting for the socket 
reference count to drop to 0.  This is very dangerous because it means that we 
may call close on a file descriptor while another thread is still using it.  If 
this happens, a third thread may open another file, which will get the same 
file descriptor number as the one which was just closed.  And then the second 
thread is doing some arbitrary operation on the wrong file descriptor.

To be honest, I do not see any way of handling this better than we currently 
do.  If {{shutdown}} fails, we simply have to block until the socket is no 
longer in use.  There is no possible shortcut that I can see.

Did you encounter this problem in production?  If so, what was the {{shutdown}} 
error?

> if an IOException error is thrown in DomainSocket.close we go into infinite 
> loop.
> ---------------------------------------------------------------------------------
>
>                 Key: HADOOP-11957
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11957
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: net
>    Affects Versions: 2.7.1
>            Reporter: Anu Engineer
>            Assignee: Anu Engineer
>         Attachments: HADOOP-11957.001.patch
>
>
> if an IOException error is thrown in DomainSocket.close we go into infinite 
> loop.
> Issue : If the shutdown0(fd) call throws an IOException we break out of the 
> if shutdown call but will continue to loop in the while loop infinitely since 
> we have no way of decrementing the counter. Please scroll down and see the 
> comment marked with Bug Bug to see where the issue is.
> {code:title=DomainSocket.java}
>   @Override
>   public void close() throws IOException {
>     // Set the closed bit on this DomainSocket
>     int count = 0;
>     try {
>       count = refCount.setClosed();
>     } catch (ClosedChannelException e) {
>       // Someone else already closed the DomainSocket.
>       return;
>     }
>     // Wait for all references to go away
>     boolean didShutdown = false;
>     boolean interrupted = false;
>     while (count > 0) {
>       if (!didShutdown) {
>         try {
>           // Calling shutdown on the socket will interrupt blocking system
>           // calls like accept, write, and read that are going on in a
>           // different thread.
>           shutdown0(fd);
>         } catch (IOException e) {
>           LOG.error("shutdown error: ", e);
>         }
>         didShutdown = true; 
>         // *BUG BUG* <-- Here the code will never exit the loop
>         // if the count is greater then 0. we need to break out
>         // of the while loop in case of IOException Error
>       }
>       try {
>         Thread.sleep(10);
>       } catch (InterruptedException e) {
>         interrupted = true;
>       }
>       count = refCount.getReferenceCount();
>     }
>     // At this point, nobody has a reference to the file descriptor, 
>     // and nobody will be able to get one in the future either.
>     // We now call close(2) on the file descriptor.
>     // After this point, the file descriptor number will be reused by 
>     // something else.  Although this DomainSocket object continues to hold 
>     // the old file descriptor number (it's a final field), we never use it 
>     // again because this DomainSocket is closed.
>     close0(fd);
>     if (interrupted) {
>       Thread.currentThread().interrupt();
>     }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to