Re: HDFS Missing Blocks / Corrupt Blocks Logic: What are the specific checks done to determine a block is bad and needs to be replicated?

Austin Hackett Thu, 22 Oct 2020 00:09:30 -0700

Hi Tom

I not too familiar with the CDH distribution, but this page has the default 
ports used by DataNode:


https://docs.cloudera.com/documentation/enterprise/latest/topics/cdh_ports.html

I believe it’s the settings for 
dfs.datanode.http.address/dfs.datanode.https.address that you’re interested in 
(9864/9865)

Since the data block scanner related config parameters are not set, the 
defaults of 3 weeks and 1MB should be applied.

Thanks

Austin

> On 22 Oct 2020, at 06:35, TomK <[email protected]> wrote:
> 
> 
> Hey Austin, Sanjeev,
> 
> Thanks once more!  Took some time to review the pages.  That was certainly 
> very helpful.  Appreciated!
> 
> However, I tried to access https://dn01/blockScannerReport on a test Cloudera 
> 6.3 cluster.  Didn't work  Tried the following as well:
> 
> http://dn01:50075/blockscannerreport?listblocks
> 
> https://dn01:50075/blockscannerreport
> 
> 
> https://dn01:10006/blockscannerreport
> 
> Checked that port 50075 is up ( netstat -pnltu ).  There's no service on that 
> port on the workers.  Checked the pages:
> 
> https://docs.cloudera.com/documentation/enterprise/5-14-x/topics/cdh_ig_ports_cdh5.html
> 
> It is defined on the pages.  Checked if the following is set:
> 
> The following 2 configurations in hdfs-site.xml are the most used for block 
> scanners.
> 
> 
> dfs.block.scanner.volume.bytes.per.second  to throttle the scan bandwidth to 
> configurable bytes per second. Default value is 1M. Setting this to 0 will 
> disable the block scanner.
> dfs.datanode.scan.period.hours to configure the scan period, which defines 
> how often a whole scan is performed. This should be set to a long enough 
> interval to really take effect, for the reasons explained above. Default 
> value is 3 weeks (504 hours). Setting this to 0 will use the default value. 
> Setting this to a negative value will disable the block scanner.  
> These are NOT explicitly set.  Checked hdfs-site.xml.  Nothing defined there. 
>  Checked the Configuration tab in the cluster.  It's not defined either.
> 
> Does this mean that the defaults are applied OR does it mean that the block / 
> volume scanner is disabled?  I see the pages detail what values for these 
> settings mean but I didn't see any notes pertaining to the situation where 
> both values are not explicitly set. 
> 
> Thx,
> TK
> 
> 
> On 10/21/2020 1:34 PM, संजीव (Sanjeev Tripurari) wrote:
>> Yes Austin,
>> 
>> you are right every datanode will do its block verification, which is send 
>> as health check report to the namenode
>> 
>> Regards
>> -Sanjeev
>> 
>> 
>> On Wed, 21 Oct 2020 at 21:53, Austin Hackett <[email protected]> wrote:
>>> Hi Tom
>>> 
>>> It is my understanding that in addition to block verification on client 
>>> reads, each data node runs a DataBlockScanner in a background thread that 
>>> periodically verifies all the blocks stored on the data node. The 
>>> dfs.datanode.scan.period.hours property controls how often this 
>>> verification occurs.
>>> 
>>> I think the reports are available via the data node /blockScannerReport 
>>> HTTP endpoint, although I’m not sure I ever actually looked at one. (add 
>>> ?listblocks to get the verification status of each block).
>>> 
>>> More info here: 
>>> https://blog.cloudera.com/hdfs-datanode-scanners-and-disk-checker-explained/
>>> 
>>> Thanks
>>> 
>>> Austin
>>> 
>>>> On 21 Oct 2020, at 16:47, TomK <[email protected]> wrote:
>>>> 
>>>> Hey Sanjeev,
>>>> 
>>>> Allright.  Thank you once more.  This is clear. 
>>>> 
>>>> However, this poses an issue then.  If during the two years, disk drives 
>>>> develop bad blocks but do not necessarily fail to the point that they 
>>>> cannot be mounted, that checksum would have changed since those filesystem 
>>>> blocks can no longer be read.  However, from an HDFS perspective, since no 
>>>> checks are done regularly, that is not known.   So HDFS still reports that 
>>>> the file is fine, in other words, no missing blocks.  For example, if a 
>>>> disk is going bad, but those files are not read for two years, the system 
>>>> won't know that there is a problem.  Even when removing a data node 
>>>> temporarily and re-adding the datanode, HDFS isn't checking because that 
>>>> HDFS file isn't read.
>>>> 
>>>> So let's assume this scenario.  Data nodes dn01 to dn10  exist. Each data 
>>>> node has 10 x 10TB drives.
>>>> And let's assume that there is one large file on those drives and it's 
>>>> replicated to factor of X3.  
>>>> 
>>>> If during the two years the file isn't read, and 10 of those drives 
>>>> develop bad blocks or other underlying hardware issues, then it is 
>>>> possible that HDFS will still report everything fine, even with a 
>>>> replication factor of 3.  Because with 10 disks failing, it's possible a 
>>>> block or sector has failed under each of the 3 copies of the data.  But 
>>>> HDFS would NOT know since nothing triggered a read of that HDFS file.  
>>>> Based on everything below, then corruption is very much possible even with 
>>>> a replication of factor X3.  A this point the file is unreadable but HDFS 
>>>> still reports no missing blocks.  
>>>> 
>>>> Similarly, if once I take a data node out, I adjust one of the files on 
>>>> the data disks, HDFS will not know and still report everything fine.  That 
>>>> is until someone read's the file.
>>>> 
>>>> Sounds like this is a very real possibility. 
>>>> 
>>>> Thx,
>>>> TK
>>>> 
>>>> 
>>>> On 10/21/2020 10:26 AM, संजीव (Sanjeev Tripurari) wrote:
>>>>> Hi Tom
>>>>> 
>>>>> Therefore, if I write a file to HDFS but access it two years later, then 
>>>>> the checksum will be computed only twice, at the beginning of the two 
>>>>> years and again at the end when a client connects?  Correct?  As long as 
>>>>> no process ever accesses the file between now and two years from now, the 
>>>>> checksum is never redone and compared to the two year old checksum in the 
>>>>> fsimage?
>>>>> 
>>>>> yes, Exactly unless data is read checksum is not verified. (when data is 
>>>>> written and when the data is read), 
>>>>> if checksum is mismatched, there is no way to correct it, you will have 
>>>>> to re-write that file.
>>>>> 
>>>>> When  datanode is added back in, there is no real read operation on the 
>>>>> files themselves.  The datanode just reports the blocks but doesn't 
>>>>> really read the blocks that are there to re-verify the files and ensure 
>>>>> consistency?
>>>>> 
>>>>> yes, Exactly, datanode maintains list of files and their blocks, which it 
>>>>> reports, along with total disk size and used size.
>>>>> Namenode only has list of blocks, unless datanodes is connected it wont 
>>>>> know where the blocks are stored.
>>>>> 
>>>>> Regards
>>>>> -Sanjeev
>>>>> 
>>>>> 
>>>>> On Wed, 21 Oct 2020 at 18:31, TomK <[email protected]> wrote:
>>>>>> Hey Sanjeev,
>>>>>> 
>>>>>> Thank you very much again.  This confirms my suspision.
>>>>>> 
>>>>>> Therefore, if I write a file to HDFS but access it two years later, then 
>>>>>> the checksum will be computed only twice, at the beginning of the two 
>>>>>> years and again at the end when a client connects?  Correct?  As long as 
>>>>>> no process ever accesses the file between now and two years from now, 
>>>>>> the checksum is never redone and compared to the two year old checksum 
>>>>>> in the fsimage?
>>>>>> 
>>>>>> When  datanode is added back in, there is no real read operation on the 
>>>>>> files themselves.  The datanode just reports the blocks but doesn't 
>>>>>> really read the blocks that are there to re-verify the files and ensure 
>>>>>> consistency?
>>>>>> 
>>>>>> Thx,
>>>>>> TK
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On 10/21/2020 12:38 AM, संजीव (Sanjeev Tripurari) wrote:
>>>>>>> Hi Tom,
>>>>>>> 
>>>>>>> Every datanode sends heartbeat to namenode, on its list of blocks it 
>>>>>>> has.
>>>>>>> 
>>>>>>> When a datanode which is disconnected for a while, after connecting 
>>>>>>> will send heartbeat to namenode, with list of blocks it has (till then 
>>>>>>> namenode will have under-replicated blocks).
>>>>>>> As soon as the datanode is connected to namenode, it will clear 
>>>>>>> under-replicatred blocks.
>>>>>>> 
>>>>>>> When a client connects to read or write a file, it will run checksum to 
>>>>>>> validate the file.
>>>>>>> 
>>>>>>> There is no independent process running to do checksum, as it will be 
>>>>>>> heavy process on each node.
>>>>>>> 
>>>>>>> Regards
>>>>>>> -Sanjeev
>>>>>>> 
>>>>>>> On Wed, 21 Oct 2020 at 00:18, Tom <[email protected]> wrote:
>>>>>>>> Thank you.  That part I understand and am Ok with it.  
>>>>>>>> 
>>>>>>>> What I would like to know next is when again the CRC32C checksum is 
>>>>>>>> ran and checked against the fsimage that the block file has not 
>>>>>>>> changed or become corrupted?  
>>>>>>>> 
>>>>>>>> For example, if I take a datanode out, and within 15 minutes, plug it 
>>>>>>>> back in, does HDF rerun the CRC 32C on all data disks on that node to 
>>>>>>>> make sure blocks are ok?
>>>>>>>> 
>>>>>>>> Cheers,
>>>>>>>> TK
>>>>>>>> 
>>>>>>>> Sent from my iPhone
>>>>>>>> 
>>>>>>>>> On Oct 20, 2020, at 1:39 PM, संजीव (Sanjeev Tripurari) 
>>>>>>>>> <[email protected]> wrote:
>>>>>>>>> 
>>>>>>>>> its done as sson as  a file is stored on disk.. 
>>>>>>>>> 
>>>>>>>>> Sanjeev 
>>>>>>>>> 
>>>>>>>>> On Tuesday, 20 October 2020, TomK <[email protected]> wrote:
>>>>>>>>>> Thanks again.
>>>>>>>>>> 
>>>>>>>>>> At what points is the checksum validated (checked) after that?  For 
>>>>>>>>>> example, is it done on a daily basis or is it done only when the 
>>>>>>>>>> file is accessed?
>>>>>>>>>> 
>>>>>>>>>> Thx,
>>>>>>>>>> TK
>>>>>>>>>> 
>>>>>>>>>> On 10/20/2020 10:18 AM, संजीव (Sanjeev Tripurari) wrote:
>>>>>>>>>>> As soon as the file is written first time checksum is calculated 
>>>>>>>>>>> and updated in fsimage (first in edit logs), and same is replicated 
>>>>>>>>>>> other replicas.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On Tue, 20 Oct 2020 at 19:15, TomK <[email protected]> wrote:
>>>>>>>>>>>> Hi Sanjeev,
>>>>>>>>>>>> 
>>>>>>>>>>>> Thank you.  It does help. 
>>>>>>>>>>>> 
>>>>>>>>>>>> At what points is the checksum calculated?  
>>>>>>>>>>>> 
>>>>>>>>>>>> Thx,
>>>>>>>>>>>> TK
>>>>>>>>>>>> 
>>>>>>>>>>>> On 10/20/2020 3:03 AM, संजीव (Sanjeev Tripurari) wrote:
>>>>>>>>>>>>> For Missing blocks and corrupted blocks, do check if all the 
>>>>>>>>>>>>> datanode services are up, non of the disks where hdfs data is 
>>>>>>>>>>>>> stored is accessible and have no issues, hosts are reachable from 
>>>>>>>>>>>>> namenode,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> If you are able to re-generate the data and write its great, 
>>>>>>>>>>>>> otherwise hadoop cannot correct itself.
>>>>>>>>>>>> Could you please elaborate on this?  Does it mean I have to 
>>>>>>>>>>>> continuously access a file for HDFS to be able to detect corrupt 
>>>>>>>>>>>> blocks and correct itself?
>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> "Does HDFS check that the data node is up, data disk is mounted, 
>>>>>>>>>>>>> path to
>>>>>>>>>>>>> the file exists and file can be read?"
>>>>>>>>>>>>> -- yes, only after it fails it will say missing blocks.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Or does it also do a filesystem check on that data disk as well as
>>>>>>>>>>>>> perhaps a checksum to ensure block integrity?
>>>>>>>>>>>>> -- yes, every file cheksum is maintained and cross checked, if it 
>>>>>>>>>>>>> fails it will say corrupted blocks.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> hope this helps.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> -Sanjeev
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Tue, 20 Oct 2020 at 09:52, TomK <[email protected]> wrote:
>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> HDFS Missing Blocks / Corrupt Blocks Logic:  What are the 
>>>>>>>>>>>>>> specific 
>>>>>>>>>>>>>> checks done to determine a block is bad and needs to be 
>>>>>>>>>>>>>> replicated?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Does HDFS check that the data node is up, data disk is mounted, 
>>>>>>>>>>>>>> path to 
>>>>>>>>>>>>>> the file exists and file can be read?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Or does it also do a filesystem check on that data disk as well 
>>>>>>>>>>>>>> as 
>>>>>>>>>>>>>> perhaps a checksum to ensure block integrity?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I've googled on this quite a bit.  I don't see the exact answer 
>>>>>>>>>>>>>> I'm 
>>>>>>>>>>>>>> looking for.  I would like to know exactly what happens during 
>>>>>>>>>>>>>> file 
>>>>>>>>>>>>>> integrity verification that then constitutes missing blocks or 
>>>>>>>>>>>>>> corrupt 
>>>>>>>>>>>>>> blocks in the reports.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> -- 
>>>>>>>>>>>>>> Thank  You,
>>>>>>>>>>>>>> TK.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>>>>> To unsubscribe, e-mail: [email protected]
>>>>>>>>>>>>>> For additional commands, e-mail: [email protected]
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> -- 
>>>>>>>>>> Thx,
>>>>>>>>>> TK.
>>>>>> 
>>>>>> -- 
>>>>>> Thx,
>>>>>> TK.
>>>> 
>>>> -- 
>>>> Thx,
>>>> TK.
>>> 
> 
> -- 
> Thx,
> TK.

Re: HDFS Missing Blocks / Corrupt Blocks Logic: What are the specific checks done to determine a block is bad and needs to be replicated?

Reply via email to