Re: HDFS Missing Blocks / Corrupt Blocks Logic: What are the specific checks done to determine a block is bad and needs to be replicated?

Austin Hackett Thu, 22 Oct 2020 08:04:19 -0700

Hi Tom

It might be worth restarting the DataNode process? I didn’t think you could 
disable the DataNode Web UI as such, but I could be wrong on this point. Out of 
interest, what does hdfs-site.xml say with regards to 
dfs.datanode.http.address/dfs.datanode.https.address?


Regarding the logs, a quick look on GitHub suggests there may be a couple of 
useful log messages:

https://github.com/apache/hadoop/blob/88a9f42f320e7c16cf0b0b424283f8e4486ef286/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockScanner.java

For example, LOG.warn(“Periodic block scanner is not running”) or 
LOG.info(“Initialized block scanner with targetBytesPerSec {}”).

Of course, you’d need make sure those LOG statements are present in the Hadoop 
version included with CDH 6.3. Git “blame” suggests the LOG statements were 
added 6 years, so chance are you have them...

Thanks

Austin

> On 22 Oct 2020, at 14:44, TomK <[email protected]> wrote:
> 
> Thanks Austin.  However none of these are open on a standard Cloudera 6.3 
> build.  
> 
> # netstat -pnltu|grep -Ei "9866|1004|9864|9865|1006|9867"
> #
> 
> Would there be anything in the logs to indicate whether or not the block / 
> volume scanner is running? 
> 
> Thx,
> TK
> 
> 
> On 10/22/2020 3:09 AM, Austin Hackett wrote:
>> Hi Tom
>> 
>> I not too familiar with the CDH distribution, but this page has the default 
>> ports used by DataNode:
>> 
>> https://docs.cloudera.com/documentation/enterprise/latest/topics/cdh_ports.html
>>  
>> <https://docs.cloudera.com/documentation/enterprise/latest/topics/cdh_ports.html>
>> 
>> I believe it’s the settings for dfs.datanode.http.address/ 
>> <>dfs.datanode.https.address that you’re interested in (9864/9865) <>
>> 
>>  <>
>> Since the data block scanner related config parameters are not set, the 
>> defaults of 3 weeks and 1MB should be applied. <>
>> 
>>  <>
>> Thanks <>
>> 
>>  <>
>> Austin
>> 
>>> On 22 Oct 2020, at 06:35, TomK <[email protected]> 
>>> <mailto:[email protected]> wrote:
>>> 
>>> 
>>> Hey Austin, Sanjeev,
>>> 
>>> Thanks once more!  Took some time to review the pages.  That was certainly 
>>> very helpful.  Appreciated!
>>> 
>>> However, I tried to access https://dn01/blockScannerReport 
>>> <https://dn01/blockScannerReport> on a test Cloudera 6.3 cluster.  Didn't 
>>> work  Tried the following as well:
>>> 
>>> http://dn01:50075/blockscannerreport?listblocks 
>>> <http://dn01:50075/blockscannerreport?listblocks>
>>> 
>>> https://dn01:50075/blockscannerreport 
>>> <https://dn01:50075/blockscannerreport>
>>> 
>>> 
>>> https://dn01:10006/blockscannerreport 
>>> <https://dn01:10006/blockscannerreport>
>>> 
>>> Checked that port 50075 is up ( netstat -pnltu ).  There's no service on 
>>> that port on the workers.  Checked the pages:
>>> 
>>> https://docs.cloudera.com/documentation/enterprise/5-14-x/topics/cdh_ig_ports_cdh5.html
>>>  
>>> <https://docs.cloudera.com/documentation/enterprise/5-14-x/topics/cdh_ig_ports_cdh5.html>
>>> 
>>> It is defined on the pages.  Checked if the following is set:
>>> 
>>> The following 2 configurations in hdfs-site.xml are the most used for block 
>>> scanners.
>>> 
>>> 
>>> dfs.block.scanner.volume.bytes.per.second  to throttle the scan bandwidth 
>>> to configurable bytes per second. Default value is 1M. Setting this to 0 
>>> will disable the block scanner.
>>> dfs.datanode.scan.period.hours to configure the scan period, which defines 
>>> how often a whole scan is performed. This should be set to a long enough 
>>> interval to really take effect, for the reasons explained above. Default 
>>> value is 3 weeks (504 hours). Setting this to 0 will use the default value. 
>>> Setting this to a negative value will disable the block scanner.  
>>> These are NOT explicitly set.  Checked hdfs-site.xml.  Nothing defined 
>>> there.  Checked the Configuration tab in the cluster.  It's not defined 
>>> either.
>>> 
>>> Does this mean that the defaults are applied OR does it mean that the block 
>>> / volume scanner is disabled?  I see the pages detail what values for these 
>>> settings mean but I didn't see any notes pertaining to the situation where 
>>> both values are not explicitly set. 
>>> 
>>> Thx,
>>> TK
>>> 
>>> 
>>> On 10/21/2020 1:34 PM, संजीव (Sanjeev Tripurari) wrote:
>>>> Yes Austin,
>>>> 
>>>> you are right every datanode will do its block verification, which is send 
>>>> as health check report to the namenode
>>>> 
>>>> Regards
>>>> -Sanjeev
>>>> 
>>>> 
>>>> On Wed, 21 Oct 2020 at 21:53, Austin Hackett <[email protected] 
>>>> <mailto:[email protected]>> wrote:
>>>> Hi Tom
>>>> 
>>>> It is my understanding that in addition to block verification on client 
>>>> reads, each data node runs a DataBlockScanner in a background thread that 
>>>> periodically verifies all the blocks stored on the data node. The 
>>>> dfs.datanode.scan.period.hours property controls how often this 
>>>> verification occurs.
>>>> 
>>>> I think the reports are available via the data node /blockScannerReport 
>>>> HTTP endpoint, although I’m not sure I ever actually looked at one. (add 
>>>> ?listblocks to get the verification status of each block).
>>>> 
>>>> More info here: 
>>>> https://blog.cloudera.com/hdfs-datanode-scanners-and-disk-checker-explained/
>>>>  
>>>> <https://blog.cloudera.com/hdfs-datanode-scanners-and-disk-checker-explained/>
>>>> 
>>>> Thanks
>>>> 
>>>> Austin
>>>> 
>>>>> On 21 Oct 2020, at 16:47, TomK <[email protected] 
>>>>> <mailto:[email protected]>> wrote:
>>>>> 
>>>>> Hey Sanjeev,
>>>>> 
>>>>> Allright.  Thank you once more.  This is clear. 
>>>>> 
>>>>> However, this poses an issue then.  If during the two years, disk drives 
>>>>> develop bad blocks but do not necessarily fail to the point that they 
>>>>> cannot be mounted, that checksum would have changed since those 
>>>>> filesystem blocks can no longer be read.  However, from an HDFS 
>>>>> perspective, since no checks are done regularly, that is not known.   So 
>>>>> HDFS still reports that the file is fine, in other words, no missing 
>>>>> blocks.  For example, if a disk is going bad, but those files are not 
>>>>> read for two years, the system won't know that there is a problem.  Even 
>>>>> when removing a data node temporarily and re-adding the datanode, HDFS 
>>>>> isn't checking because that HDFS file isn't read.
>>>>> 
>>>>> So let's assume this scenario.  Data nodes dn01 to dn10  exist. Each data 
>>>>> node has 10 x 10TB drives.
>>>>> And let's assume that there is one large file on those drives and it's 
>>>>> replicated to factor of X3.  
>>>>> 
>>>>> If during the two years the file isn't read, and 10 of those drives 
>>>>> develop bad blocks or other underlying hardware issues, then it is 
>>>>> possible that HDFS will still report everything fine, even with a 
>>>>> replication factor of 3.  Because with 10 disks failing, it's possible a 
>>>>> block or sector has failed under each of the 3 copies of the data.  But 
>>>>> HDFS would NOT know since nothing triggered a read of that HDFS file.  
>>>>> Based on everything below, then corruption is very much possible even 
>>>>> with a replication of factor X3.  A this point the file is unreadable but 
>>>>> HDFS still reports no missing blocks.  
>>>>> 
>>>>> Similarly, if once I take a data node out, I adjust one of the files on 
>>>>> the data disks, HDFS will not know and still report everything fine.  
>>>>> That is until someone read's the file.
>>>>> 
>>>>> Sounds like this is a very real possibility. 
>>>>> 
>>>>> Thx,
>>>>> TK
>>>>> 
>>>>> 
>>>>> On 10/21/2020 10:26 AM, संजीव (Sanjeev Tripurari) wrote:
>>>>>> Hi Tom
>>>>>> 
>>>>>> Therefore, if I write a file to HDFS but access it two years later, then 
>>>>>> the checksum will be computed only twice, at the beginning of the two 
>>>>>> years and again at the end when a client connects?  Correct?  As long as 
>>>>>> no process ever accesses the file between now and two years from now, 
>>>>>> the checksum is never redone and compared to the two year old checksum 
>>>>>> in the fsimage?
>>>>>> 
>>>>>> yes, Exactly unless data is read checksum is not verified. (when data is 
>>>>>> written and when the data is read), 
>>>>>> if checksum is mismatched, there is no way to correct it, you will have 
>>>>>> to re-write that file.
>>>>>> 
>>>>>> When  datanode is added back in, there is no real read operation on the 
>>>>>> files themselves.  The datanode just reports the blocks but doesn't 
>>>>>> really read the blocks that are there to re-verify the files and ensure 
>>>>>> consistency?
>>>>>> 
>>>>>> yes, Exactly, datanode maintains list of files and their blocks, which 
>>>>>> it reports, along with total disk size and used size.
>>>>>> Namenode only has list of blocks, unless datanodes is connected it wont 
>>>>>> know where the blocks are stored.
>>>>>> 
>>>>>> Regards
>>>>>> -Sanjeev
>>>>>> 
>>>>>> 
>>>>>> On Wed, 21 Oct 2020 at 18:31, TomK <[email protected] 
>>>>>> <mailto:[email protected]>> wrote:
>>>>>> Hey Sanjeev,
>>>>>> 
>>>>>> Thank you very much again.  This confirms my suspision.
>>>>>> 
>>>>>> Therefore, if I write a file to HDFS but access it two years later, then 
>>>>>> the checksum will be computed only twice, at the beginning of the two 
>>>>>> years and again at the end when a client connects?  Correct?  As long as 
>>>>>> no process ever accesses the file                                       
>>>>>> between now and two years from now, the checksum is never redone and 
>>>>>> compared to the two year old checksum in the fsimage?
>>>>>> 
>>>>>> When  datanode is added back in, there is no real read operation on the 
>>>>>> files themselves.  The datanode just reports the blocks but doesn't 
>>>>>> really read the blocks that are there to re-verify the files and ensure 
>>>>>> consistency?
>>>>>> 
>>>>>> Thx,
>>>>>> TK
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On 10/21/2020 12:38 AM, संजीव (Sanjeev Tripurari) wrote:
>>>>>>> Hi Tom,
>>>>>>> 
>>>>>>> Every datanode sends heartbeat to namenode, on its list of blocks it 
>>>>>>> has.
>>>>>>> 
>>>>>>> When a datanode which is disconnected for a while, after connecting 
>>>>>>> will send heartbeat to namenode, with list of blocks it has (till then 
>>>>>>> namenode will have under-replicated blocks).
>>>>>>> As soon as the datanode is connected to namenode, it will clear 
>>>>>>> under-replicatred blocks.
>>>>>>> 
>>>>>>> When a client connects to read or write a file, it will run checksum to 
>>>>>>> validate the file.
>>>>>>> 
>>>>>>> There is no independent process running to do checksum, as it will be 
>>>>>>> heavy process on each node.
>>>>>>> 
>>>>>>> Regards
>>>>>>> -Sanjeev
>>>>>>> 
>>>>>>> On Wed, 21 Oct 2020 at 00:18, Tom <[email protected] 
>>>>>>> <mailto:[email protected]>> wrote:
>>>>>>> Thank you.  That part I understand and am Ok with it.  
>>>>>>> 
>>>>>>> What I would like to know next is when again the CRC32C checksum is ran 
>>>>>>> and checked against the fsimage that the block file has not changed or 
>>>>>>> become corrupted?  
>>>>>>> 
>>>>>>> For example, if I take a datanode out, and within 15 minutes, plug it 
>>>>>>> back in, does HDF rerun the CRC 32C on all data disks on that node to 
>>>>>>> make sure blocks are ok?
>>>>>>> 
>>>>>>> Cheers,
>>>>>>> TK
>>>>>>> 
>>>>>>> Sent from my iPhone
>>>>>>> 
>>>>>>>> On Oct 20, 2020, at 1:39 PM, संजीव (Sanjeev Tripurari) 
>>>>>>>> <[email protected] <mailto:[email protected]>> wrote:
>>>>>>>> 
>>>>>>>> its done as sson as  a file is stored on disk.. 
>>>>>>>> 
>>>>>>>> Sanjeev 
>>>>>>>> 
>>>>>>>> On Tuesday, 20 October 2020, TomK <[email protected] 
>>>>>>>> <mailto:[email protected]>> wrote:
>>>>>>>> Thanks again.
>>>>>>>> 
>>>>>>>> At what points is the checksum validated (checked) after that?  For 
>>>>>>>> example, is it done on a daily basis or is it done only when the file 
>>>>>>>> is accessed?
>>>>>>>> 
>>>>>>>> Thx,
>>>>>>>> TK
>>>>>>>> 
>>>>>>>> On 10/20/2020 10:18 AM, संजीव (Sanjeev Tripurari) wrote:
>>>>>>>>> As soon as the file is written first time checksum is calculated and 
>>>>>>>>> updated in fsimage (first in edit logs), and same is replicated other 
>>>>>>>>> replicas.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Tue, 20 Oct 2020 at 19:15, TomK <[email protected] 
>>>>>>>>> <mailto:[email protected]>> wrote:
>>>>>>>>> Hi Sanjeev,
>>>>>>>>> 
>>>>>>>>> Thank you.  It does help. 
>>>>>>>>> 
>>>>>>>>> At what points is the checksum calculated?  
>>>>>>>>> 
>>>>>>>>> Thx,
>>>>>>>>> TK
>>>>>>>>> 
>>>>>>>>> On 10/20/2020 3:03 AM, संजीव (Sanjeev Tripurari) wrote:
>>>>>>>>>> For Missing blocks and corrupted blocks, do check if all the 
>>>>>>>>>> datanode services are up, non of the disks where hdfs data is stored 
>>>>>>>>>> is accessible and have no issues, hosts are reachable from namenode,
>>>>>>>>>> 
>>>>>>>>>> If you are able to re-generate the data and write its great, 
>>>>>>>>>> otherwise hadoop cannot correct itself.
>>>>>>>>> Could you please elaborate on this?  Does it mean I have to 
>>>>>>>>> continuously access a file for HDFS to be able to detect corrupt 
>>>>>>>>> blocks and correct itself?
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> "Does HDFS check that the data node is up, data disk is mounted, 
>>>>>>>>>> path to
>>>>>>>>>> the file exists and file can be read?"
>>>>>>>>>> -- yes, only after it fails it will say missing blocks.
>>>>>>>>>> 
>>>>>>>>>> Or does it also do a filesystem check on that data disk as well as
>>>>>>>>>> perhaps a checksum to ensure block integrity?
>>>>>>>>>> -- yes, every file cheksum is maintained and cross checked, if it 
>>>>>>>>>> fails it will say corrupted blocks.
>>>>>>>>>> 
>>>>>>>>>> hope this helps.
>>>>>>>>>> 
>>>>>>>>>> -Sanjeev
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Tue, 20 Oct 2020 at 09:52, TomK <[email protected] 
>>>>>>>>>> <mailto:[email protected]>> wrote:
>>>>>>>>>> Hello,
>>>>>>>>>> 
>>>>>>>>>> HDFS Missing Blocks / Corrupt Blocks Logic:  What are the specific 
>>>>>>>>>> checks done to determine a block is bad and needs to be replicated?
>>>>>>>>>> 
>>>>>>>>>> Does HDFS check that the data node is up, data disk is mounted, path 
>>>>>>>>>> to 
>>>>>>>>>> the file exists and file can be read?
>>>>>>>>>> 
>>>>>>>>>> Or does it also do a filesystem check on that data disk as well as 
>>>>>>>>>> perhaps a checksum to ensure block integrity?
>>>>>>>>>> 
>>>>>>>>>> I've googled on this quite a bit.  I don't see the exact answer I'm 
>>>>>>>>>> looking for.  I would like to know exactly what happens during file 
>>>>>>>>>> integrity verification that then constitutes missing blocks or 
>>>>>>>>>> corrupt 
>>>>>>>>>> blocks in the reports.
>>>>>>>>>> 
>>>>>>>>>> -- 
>>>>>>>>>> Thank  You,
>>>>>>>>>> TK.
>>>>>>>>>> 
>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>> To unsubscribe, e-mail: [email protected] 
>>>>>>>>>> <mailto:[email protected]>
>>>>>>>>>> For additional commands, e-mail: [email protected] 
>>>>>>>>>> <mailto:[email protected]>
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> -- 
>>>>>>>> Thx,
>>>>>>>> TK.
>>>>>> 
>>>>>> -- 
>>>>>> Thx,
>>>>>> TK.
>>>>> 
>>>>> -- 
>>>>> Thx,
>>>>> TK.
>>>> 
>>> 
>>> -- 
>>> Thx,
>>> TK.
> 
> -- 
> Thx,
> TK.

Re: HDFS Missing Blocks / Corrupt Blocks Logic: What are the specific checks done to determine a block is bad and needs to be replicated?

Reply via email to