Re: HDFS Missing Blocks / Corrupt Blocks Logic: What are the specific checks done to determine a block is bad and needs to be replicated?

TomK Wed, 21 Oct 2020 08:48:33 -0700

Hey Sanjeev,

Allright.  Thank you once more.  This is clear.

However, this poses an issue then. If during the two years, disk drivesdevelop bad blocks but do not necessarily fail to the point that theycannot be mounted, that checksum would have changed since thosefilesystem blocks can no longer be read. However, from an HDFSperspective, since no checks are done regularly, that is not known. SoHDFS still reports that the file is fine, in other words, no missingblocks. For example, if a disk is going bad, but those files are notread for two years, the system won't know that there is a problem. Evenwhen removing a data node temporarily and re-adding the datanode, HDFSisn't checking because that HDFS file isn't read.

So let's assume this scenario. Data nodes *dn01* to *dn10* exist. Eachdata node has 10 x 10TB drives.

And let's assume that there is one large file on those drives and it'sreplicated to factor of X3.

If during the two years the file isn't read, and 10 of those drivesdevelop bad blocks or other underlying hardware issues, then it ispossible that HDFS will still report everything fine, even with areplication factor of 3. Because with 10 disks failing, it's possible ablock or sector has failed under each of the 3 copies of the data. ButHDFS would NOT know since nothing triggered a read of that HDFS file. Based on everything below, then corruption is very much possible evenwith a replication of factor X3. A this point the file is unreadablebut HDFS still reports no missing blocks.

Similarly, if once I take a data node out, I adjust one of the files onthe data disks, HDFS will not know and still report everything fine. That is until someone read's the file.


Sounds like this is a very real possibility.

Thx,
TK


On 10/21/2020 10:26 AM, संजीव (Sanjeev Tripurari) wrote:

Hi Tom

Therefore, if I write a file to HDFS but access it two years later,then the checksum will be computed only twice, at the beginning of thetwo years and again at the end when a client connects? Correct? Aslong as no process ever accesses the file between now and two yearsfrom now, the checksum is never redone and compared to the two yearold checksum in the fsimage?

yes, Exactly unless data is read checksum is not verified. (when datais written and when the data is read),if checksum is mismatched, there is no way to correct it, you willhave to re-write that file.

When datanode is added back in, there is no real read operation onthe files themselves. The datanode just reports the blocks butdoesn't really read the blocks that are there to re-verify the filesand ensure consistency?

yes, Exactly, datanode maintains list of files and their blocks, whichit reports, along with total disk size and used size.Namenode only has list of blocks, unless datanodes is connected itwont know where the blocks are stored.


Regards
-Sanjeev

On Wed, 21 Oct 2020 at 18:31, TomK <[email protected]<mailto:[email protected]>> wrote:


    Hey Sanjeev,

    Thank you very much again.  This confirms my suspision.

    Therefore, if I write a file to HDFS but access it two years
    later, then the checksum will be computed only twice, at the
    beginning of the two years and again at the end when a client
    connects?  Correct?  As long as no process ever accesses the file
    between now and two years from now, the checksum is never redone
    and compared to the two year old checksum in the fsimage?

    When  datanode is added back in, there is no real read operation
    on the files themselves.  The datanode just reports the blocks but
    doesn't really read the blocks that are there to re-verify the
    files and ensure consistency?

    Thx,
    TK



    On 10/21/2020 12:38 AM, संजीव (Sanjeev Tripurari) wrote:

    Hi Tom,

    Every datanode sends heartbeat to namenode, on its list of blocks
    it has.

    When a datanode which is disconnected for a while, after
    connecting will send heartbeat to namenode, with list of blocks
    it has (till then namenode will have under-replicated blocks).
    As soon as the datanode is connected to namenode, it will clear
    under-replicatred blocks.

    *When a client connects to read or write a file, it will run
    checksum to validate the file.*

    There is no independent process running to do checksum, as it
    will be heavy process on each node.

    Regards
    -Sanjeev

    On Wed, 21 Oct 2020 at 00:18, Tom <[email protected]
    <mailto:[email protected]>> wrote:

        Thank you.  That part I understand and am Ok with it.

        What I would like to know next is when again the CRC32C
        checksum is ran and checked against the fsimage that the
        block file has not changed or become corrupted?

        For example, if I take a datanode out, and within 15 minutes,
        plug it back in, does HDF rerun the CRC 32C on all data disks
        on that node to make sure blocks are ok?

        Cheers,
        TK

        Sent from my iPhone

        On Oct 20, 2020, at 1:39 PM, संजीव (Sanjeev Tripurari)
        <[email protected]
        <mailto:[email protected]>> wrote:

        its done as sson as  a file is stored on disk..

        Sanjeev

        On Tuesday, 20 October 2020, TomK <[email protected]
        <mailto:[email protected]>> wrote:

            Thanks again.

            At what points is the checksum validated (checked) after
            that?  For example, is it done on a daily basis or is it
            done only when the file is accessed?

            Thx,
            TK

            On 10/20/2020 10:18 AM, संजीव (Sanjeev Tripurari) wrote:

            As soon as the file is written first time checksum is
            calculated and updated in fsimage (first in edit logs),
            and same is replicated other replicas.



            On Tue, 20 Oct 2020 at 19:15, TomK <[email protected]
            <mailto:[email protected]>> wrote:

                Hi Sanjeev,

                Thank you.  It does help.

                At what points is the checksum calculated?

                Thx,
                TK

                On 10/20/2020 3:03 AM, संजीव (Sanjeev Tripurari) wrote:

                For Missing blocks and corrupted blocks, do check
                if all the datanode services are up, non of the
                disks where hdfs data is stored is accessible and
                have no issues, hosts are reachable from namenode,

                If you are able to re-generate the data and write
                its great, otherwise hadoop cannot correct itself.


                Could you please elaborate on this?  Does it mean I
                have to continuously access a file for HDFS to be
                able to detect corrupt blocks and correct itself?



                *"Does HDFS check that the data node is up, data
                disk is mounted, path to
                the file exists and file can be read?"*
                -- yes, only after it fails it will say missing
                blocks.

                *Or does it also do a filesystem check on that
                data disk as well as
                perhaps a checksum to ensure block integrity?*
                -- yes, every file cheksum is maintained and cross
                checked, if it fails it will say corrupted blocks.

                hope this helps.

                -Sanjeev
                *
                *

                On Tue, 20 Oct 2020 at 09:52, TomK
                <[email protected] <mailto:[email protected]>>
                wrote:

                    Hello,

                    HDFS Missing Blocks / Corrupt Blocks Logic:
                    What are the specific
                    checks done to determine a block is bad and
                    needs to be replicated?

                    Does HDFS check that the data node is up, data
                    disk is mounted, path to
                    the file exists and file can be read?

                    Or does it also do a filesystem check on that
                    data disk as well as
                    perhaps a checksum to ensure block integrity?

                    I've googled on this quite a bit.  I don't see
                    the exact answer I'm
                    looking for.  I would like to know exactly
                    what happens during file
                    integrity verification that then constitutes
                    missing blocks or corrupt
                    blocks in the reports.

--Thank You,

                    TK.

                    
---------------------------------------------------------------------
                    To unsubscribe, e-mail:
                    [email protected]
                    <mailto:[email protected]>
                    For additional commands, e-mail:
                    [email protected]
                    <mailto:[email protected]>

--Thx,

TK.

--Thx,

TK.


--
Thx,
TK.

Re: HDFS Missing Blocks / Corrupt Blocks Logic: What are the specific checks done to determine a block is bad and needs to be replicated?

Reply via email to