[
https://issues.apache.org/jira/browse/HADOOP-13738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15592298#comment-15592298
]
Kihwal Lee commented on HADOOP-13738:
-------------------------------------
The existing implementation is mainly for detecting read-only file system
(mkdir fails with EROFS) and unmounted storage (fails with EPERM).
We have seen cases where written data is lost after closing because delayed
block allocation failed in kernel. Since this failure is asynchronous to the
file write/close, no user process received an error. I think enabling
{{syncOnClose}} will make such writes to fail with {{EIO}}. The write-sync
test will more likely detect this kind of conditions, so I think this approach
has a merit.
Another common disk failure mode involves read error. Writes go through fine,
but reading back can cause an unrecoverable error/hang. Unless the affected
sector is used for file system metadata, no action at file system-level will be
taken. This is kind of being dealt with by adding the affected block to the
volume scanner queue. The write-sync check will still catch many bad disks.
Any particular reason why it retries on FNFE? When do you think that will
happen?
> DiskChecker should perform some disk IO
> ---------------------------------------
>
> Key: HADOOP-13738
> URL: https://issues.apache.org/jira/browse/HADOOP-13738
> Project: Hadoop Common
> Issue Type: Improvement
> Reporter: Arpit Agarwal
> Assignee: Arpit Agarwal
> Attachments: HADOOP-13738.01.patch
>
>
> DiskChecker can fail to detect total disk/controller failures indefinitely.
> We have seen this in real clusters. DiskChecker performs simple
> permissions-based checks on directories which do not guarantee that any disk
> IO will be attempted.
> A simple improvement is to write some data and flush it to the disk.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]