[
https://issues.apache.org/jira/browse/HADOOP-12780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
madhumita chakraborty updated HADOOP-12780:
-------------------------------------------
Description:
During atomic folder rename process preperaion we record the proposed change to
a metadata file (-renamePending.json).
Say we are renaming parent/folderToRename to parent/renamedFolder.
folderToRename has an inner folder innerFolder and innerFolder has a file
innerFile
Content of the –renamePending.json file will be
{ OldFolderName: parent/ folderToRename", NewFolderName:
"parent/renamedFolder", FileList: [ "innerFolder", "innerFolder/innerFile" ] }
Atfirst we rename all files within the source directory and then rename the
source directory at the last step
The steps are
1. Atfirst we will rename innerFolder,
2. Then rename innerFolder/innerFile
3. Then rename source directory folderToRename
Say the process crashes after step 1.
So innerFolder has been renamed.
Note that Azure storage does not natively support folder. So if a directory
created by mkdir command, we create an empty placeholder blob with metadata for
the directory.
So after step 1, the empty blob corresponding to the directory innerFolder has
been renamed.
When the process comes up, in redo path it will go through the
–renamePending.json file try to redo the renames.
For each file in file list of renamePending file it checks if the source file
exists, if source file exists then it renames the file. When it gets
innerFolder, it calls filesystem.exists(innerFolder). Now
filesystem.exists(innerFolder) will return true, because file under that folder
exists even though the empty blob corresponding th that folder does not exist.
So it will try to rename this folder, and as the empty blob has already been
deleted so this fails with exception that “source blob does not exist”.
was:
During atomic folder rename process preperaion we record the proposed change to
a metadata file (-renamePending.json).
Say we are renaming parent/folderToRename to parent/renamedFolder.
folderToRename has an inner folder innerFolder and innerFolder has a file
innerFile
Content of the –renamePending.json file will be
{ OldFolderName: /parent/ folderToRename", NewFolderName:
"parent/renamedFolder", FileList: [ "innerFolder", "innerFolder/innerFile" ] }
Atfirst we rename all files within the source directory and then rename the
source directory at the last step
The steps are
1. Atfirst we will rename innerFolder,
2. Then rename innerFolder/innerFile
3. Then rename source directory folderToRename
Say the process crashes after step 1.
So innerFolder has been renamed.
Note that Azure storage does not natively support folder. So if a directory
created by mkdir command, we create an empty placeholder blob with metadata for
the directory.
So after step 1, the empty blob corresponding to the directory innerFolder has
been renamed.
When the process comes up, in redo path it will go through the
–renamePending.json file try to redo the renames.
For each file in file list of renamePending file it checks if the source file
exists, if source file exists then it renames the file. When it gets
innerFolder, it calls filesystem.exists(innerFolder). Now
filesystem.exists(innerFolder) will return true, because file under that folder
exists even though the empty blob corresponding th that folder does not exist.
So it will try to rename this folder, and as the empty blob has already been
deleted so this fails with exception that “source blob does not exist”.
> During atomic rename handle crash when one directory has been renamed but not
> file under it.
> --------------------------------------------------------------------------------------------
>
> Key: HADOOP-12780
> URL: https://issues.apache.org/jira/browse/HADOOP-12780
> Project: Hadoop Common
> Issue Type: Bug
> Components: fs/azure
> Affects Versions: 2.8.0
> Reporter: madhumita chakraborty
> Assignee: madhumita chakraborty
> Priority: Critical
>
> During atomic folder rename process preperaion we record the proposed change
> to a metadata file (-renamePending.json).
> Say we are renaming parent/folderToRename to parent/renamedFolder.
> folderToRename has an inner folder innerFolder and innerFolder has a file
> innerFile
> Content of the –renamePending.json file will be
> { OldFolderName: parent/ folderToRename", NewFolderName:
> "parent/renamedFolder", FileList: [ "innerFolder", "innerFolder/innerFile" ] }
> Atfirst we rename all files within the source directory and then rename the
> source directory at the last step
> The steps are
> 1. Atfirst we will rename innerFolder,
> 2. Then rename innerFolder/innerFile
> 3. Then rename source directory folderToRename
> Say the process crashes after step 1.
> So innerFolder has been renamed.
> Note that Azure storage does not natively support folder. So if a directory
> created by mkdir command, we create an empty placeholder blob with metadata
> for the directory.
> So after step 1, the empty blob corresponding to the directory innerFolder
> has been renamed.
> When the process comes up, in redo path it will go through the
> –renamePending.json file try to redo the renames.
> For each file in file list of renamePending file it checks if the source file
> exists, if source file exists then it renames the file. When it gets
> innerFolder, it calls filesystem.exists(innerFolder). Now
> filesystem.exists(innerFolder) will return true, because file under that
> folder exists even though the empty blob corresponding th that folder does
> not exist. So it will try to rename this folder, and as the empty blob has
> already been deleted so this fails with exception that “source blob does not
> exist”.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)