[
https://issues.apache.org/jira/browse/HADOOP-11957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550941#comment-14550941
]
Colin Patrick McCabe commented on HADOOP-11957:
-----------------------------------------------
Thanks for looking at this, [~anu]. The patch is not correct because right now
it unconditionally breaks out of the loop without waiting for the socket
reference count to drop to 0. This is very dangerous because it means that we
may call close on a file descriptor while another thread is still using it. If
this happens, a third thread may open another file, which will get the same
file descriptor number as the one which was just closed. And then the second
thread is doing some arbitrary operation on the wrong file descriptor.
To be honest, I do not see any way of handling this better than we currently
do. If {{shutdown}} fails, we simply have to block until the socket is no
longer in use. There is no possible shortcut that I can see.
Did you encounter this problem in production? If so, what was the {{shutdown}}
error?
> if an IOException error is thrown in DomainSocket.close we go into infinite
> loop.
> ---------------------------------------------------------------------------------
>
> Key: HADOOP-11957
> URL: https://issues.apache.org/jira/browse/HADOOP-11957
> Project: Hadoop Common
> Issue Type: Bug
> Components: net
> Affects Versions: 2.7.1
> Reporter: Anu Engineer
> Assignee: Anu Engineer
> Attachments: HADOOP-11957.001.patch
>
>
> if an IOException error is thrown in DomainSocket.close we go into infinite
> loop.
> Issue : If the shutdown0(fd) call throws an IOException we break out of the
> if shutdown call but will continue to loop in the while loop infinitely since
> we have no way of decrementing the counter. Please scroll down and see the
> comment marked with Bug Bug to see where the issue is.
> {code:title=DomainSocket.java}
> @Override
> public void close() throws IOException {
> // Set the closed bit on this DomainSocket
> int count = 0;
> try {
> count = refCount.setClosed();
> } catch (ClosedChannelException e) {
> // Someone else already closed the DomainSocket.
> return;
> }
> // Wait for all references to go away
> boolean didShutdown = false;
> boolean interrupted = false;
> while (count > 0) {
> if (!didShutdown) {
> try {
> // Calling shutdown on the socket will interrupt blocking system
> // calls like accept, write, and read that are going on in a
> // different thread.
> shutdown0(fd);
> } catch (IOException e) {
> LOG.error("shutdown error: ", e);
> }
> didShutdown = true;
> // *BUG BUG* <-- Here the code will never exit the loop
> // if the count is greater then 0. we need to break out
> // of the while loop in case of IOException Error
> }
> try {
> Thread.sleep(10);
> } catch (InterruptedException e) {
> interrupted = true;
> }
> count = refCount.getReferenceCount();
> }
> // At this point, nobody has a reference to the file descriptor,
> // and nobody will be able to get one in the future either.
> // We now call close(2) on the file descriptor.
> // After this point, the file descriptor number will be reused by
> // something else. Although this DomainSocket object continues to hold
> // the old file descriptor number (it's a final field), we never use it
> // again because this DomainSocket is closed.
> close0(fd);
> if (interrupted) {
> Thread.currentThread().interrupt();
> }
> }
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)