[ 
https://issues.apache.org/jira/browse/HADOOP-16378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

K S updated HADOOP-16378:
-------------------------
    Affects Version/s: 3.3.0
          Environment: Ubuntu 18.04, Hadoop 2.7.3 (Though this problem exists 
on later versions of Hadoop as well), Java 8 ( + Java 11)
          Description: 
Bug occurs when Hadoop creates temporary ".nfs*" files as part of file moves 
and accesses. If this file is deleted very quickly after being created, a 
RuntimeException is thrown. The root cause is in the loadPermissionInfo method 
in org.apache.hadoop.fs.RawLocalFileSystem. To get the permission info, it 
first does

 
{code:java}
ls -ld{code}
 and then attempts to get permissions info about each file. If a file 
disappears between these two steps, an exception is thrown.

*Reproduction Steps:*

An isolated way to reproduce the bug is to run FileInputFormat.listStatus over 
and over on the same dir that we’re creating those temp files in. On Ubuntu or 
any other Linux-based system, this should fail intermittently

*Fix:*

One way in which we managed to fix this was to ignore the exception being 
thrown in loadPemissionInfo() if the exit code is 1 or 2. Alternatively, it's 
possible that turning "useDeprecatedFileStatus" off in RawLocalFileSystem would 
fix this issue, though we never tested this, and the flag was implemented to 
fix HADOOP-9652

 

 

 

 

  was:
Bug occurs when Hadoop creates temporary ".nfs*" files as part of file moves 
and accesses. If this file is deleted very quickly after being created, a 
RuntimeException is thrown. The root cause is in the loadPermissionInfo method 
in org.apache.hadoop.fs.RawLocalFileSystem. To get the permission info, it 
first does

 
{code:java}
ls -ld{code}
 and then attempts to get permissions info about each file. If a file 
disappears between these two steps, an exception is thrown.

*Reproduction Steps:*

An isolated way to reproduce the bug is to run FileInputFormat.listStatus over 
and over on the same dir that we’re creating those temp files in. On Ubuntu or 
any other Linux-based system, this should fail intermittently. On MacOS (due to 
differences in how `ls` returns status codes) this should not fail. 

*Fix:*

One way in which we managed to fix this was to ignore the exception being 
thrown in loadPemissionInfo() if the exit code is 1 or 2.

 

 

 

 


> RawLocalFileStatus throws exception if a file is created and deleted quickly
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-16378
>                 URL: https://issues.apache.org/jira/browse/HADOOP-16378
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 3.3.0
>         Environment: Ubuntu 18.04, Hadoop 2.7.3 (Though this problem exists 
> on later versions of Hadoop as well), Java 8 ( + Java 11)
>            Reporter: K S
>            Priority: Major
>
> Bug occurs when Hadoop creates temporary ".nfs*" files as part of file moves 
> and accesses. If this file is deleted very quickly after being created, a 
> RuntimeException is thrown. The root cause is in the loadPermissionInfo 
> method in org.apache.hadoop.fs.RawLocalFileSystem. To get the permission 
> info, it first does
>  
> {code:java}
> ls -ld{code}
>  and then attempts to get permissions info about each file. If a file 
> disappears between these two steps, an exception is thrown.
> *Reproduction Steps:*
> An isolated way to reproduce the bug is to run FileInputFormat.listStatus 
> over and over on the same dir that we’re creating those temp files in. On 
> Ubuntu or any other Linux-based system, this should fail intermittently
> *Fix:*
> One way in which we managed to fix this was to ignore the exception being 
> thrown in loadPemissionInfo() if the exit code is 1 or 2. Alternatively, it's 
> possible that turning "useDeprecatedFileStatus" off in RawLocalFileSystem 
> would fix this issue, though we never tested this, and the flag was 
> implemented to fix HADOOP-9652
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to