Is Hadoop validating the checksum when reading only a part of a file?

Ralph Soika Mon, 18 Sep 2017 06:54:02 -0700

Hi,

I have a question about the read behavior of partial read in a largedata file.I want to implement a archive solution where I append smaller XML filesinto a big archive file via WebHDFS.For each new added file, my client stores the offset and size of the xmlfile appended into the archive file.Wen I later need to read a XML file from the big archive file, I use the'offset' and 'length' parameter to read only a part of the file:


http://<HOST>:/webhdfs/v1/<PATH>?op=OPEN[&offset=<LONG>][&length=<LONG>]

My question now is: Is in this case Hadoop verifying the checksum toguaranties the data integrity of the partial read?

I guess only the checksum of the affected block will be verified but notthe complete archive file?

Or is partial read a performance issue?

Thanks for help in advance

===
Ralph

--
*Imixs*...extends the way people work together

We are an open source company, read more at: www.imixs.org<http://www.imixs.org>

------------------------------------------------------------------------
Imixs Software Solutions GmbH
Agnes-Pockels-Bogen 1, 80992 München
*Web:* www.imixs.com <http://www.imixs.com>
*Office:* +49 (0)89-452136 16 *Mobil:* +49-177-4128245
Registergericht: Amtsgericht Muenchen, HRB 136045
Geschaeftsfuehrer: Gaby Heinle u. Ralph Soika

Is Hadoop validating the checksum when reading only a part of a file?

Reply via email to