Thanks a lot for your answer. This makes it now clear to me and I expected that hadoop work in this way.

===
Ralph


On 20.09.2017 07:57, Harsh J wrote:
Yes, checksum match is checked for every form of read (unless explicitly disabled). By default, a checksum is generated and stored for every 512 bytes of data (io.bytes.per.checksum), so only the relevant parts are checked vs. the whole file when doing a partial read.

On Mon, 18 Sep 2017 at 19:23 Ralph Soika <[email protected] <mailto:[email protected]>> wrote:

    Hi,

    I have a question about the read behavior of partial read in a
    large data file.
    I want to implement a archive solution where I append smaller XML
    files into a big archive file via WebHDFS.
    For each new added file, my client stores the offset and size of
    the xml file appended into the archive file.
    Wen I later need to read a XML file from the big archive file, I
    use the 'offset' and 'length' parameter to read only a part of the
    file:

    http://<HOST>:/webhdfs/v1/<PATH>?op=OPEN[&offset=<LONG>][&length=<LONG>]


    My question now is: Is in this case Hadoop verifying the checksum
    to guaranties the data integrity of the partial read?

    I guess only the checksum of the affected block will be verified
    but not the complete archive file?
    Or is partial read a performance issue?

    Thanks for help in advance

    ===
    Ralph

-- *Imixs*...extends the way people work together
    We are an open source company, read more at: www.imixs.org
    <http://www.imixs.org>
    ------------------------------------------------------------------------
    Imixs Software Solutions GmbH
    Agnes-Pockels-Bogen 1, 80992 München
    
<https://maps.google.com/?q=Agnes-Pockels-Bogen+1,+80992+M%C3%BCnchen&entry=gmail&source=g>
    *Web:* www.imixs.com <http://www.imixs.com>
    *Office:* +49 (0)89-452136 16 <tel:+49%2089%2045213616> *Mobil:*
    +49-177-4128245 <tel:+49%20177%204128245>
    Registergericht: Amtsgericht Muenchen, HRB 136045
    Geschaeftsfuehrer: Gaby Heinle u. Ralph Soika


--
*Imixs*...extends the way people work together
We are an open source company, read more at: www.imixs.org <http://www.imixs.org>
------------------------------------------------------------------------
Imixs Software Solutions GmbH
Agnes-Pockels-Bogen 1, 80992 München
*Web:* www.imixs.com <http://www.imixs.com>
*Office:* +49 (0)89-452136 16 *Mobil:* +49-177-4128245
Registergericht: Amtsgericht Muenchen, HRB 136045
Geschaeftsfuehrer: Gaby Heinle u. Ralph Soika

Reply via email to