adoroszlai opened a new pull request #1605: HDDS-2259. Container Data Scrubber computes wrong checksum URL: https://github.com/apache/hadoop/pull/1605 ## What changes were proposed in this pull request? Compute checksum in container scrubber only for the actual length of data read. Otherwise, if the actual chunk size is not an integer multiple of the number of bytes per checksum (ie. buffer size), leftover data in the buffer results in wrong checksum and unhealthy containers. ``` Corruption detected in container: [1] Exception: [Inconsistent read for chunk=102914246583189504_chunk_1 len=671 expected checksum [0, 0, 0, 0, -14, -102, -99, -51] actual checksum [0, 0, 0, 0, 23, -23, 53, -79] for block conID: 1 locID: 102914246583189504 bcsId: 3] ``` https://issues.apache.org/jira/browse/HDDS-2259 ## How was this patch tested? 1. Changed unit test to reproduce the problem by making sure that "bytes per checksum" and "chunk size" are different. 2. Tested manually 1. Created and closed containers with small (<1KB), medium (~7MB) and large (100MB) files. 2. Verified that container scanner does not mark any of these unhealthy. 3. Appended some garbage data to one of the chunk files. 4. Verified that container scanner marks the corrupted container as unhealthy. ``` ozone sh volume create vol1 ozone sh bucket create vol1/bucket1 ozone sh key put vol1/bucket1/small /etc/passwd ozone scmcli container close 1 ozone sh key put vol1/bucket1/medium /opt/hadoop/share/ozone/lib/hadoop-hdfs-client-3.2.0.jar ozone scmcli container close 2 ozone sh key put vol1/bucket1/large /opt/hadoop/share/ozone/lib/hadoop-ozone-filesystem-lib-legacy-0.5.0-SNAPSHOT.jar ozone scmcli container close 3 # later echo asdfasdf >> /data/hdds/hdds/*/current/containerDir0/2/chunks/*_chunk_1 ``` Log: ``` Completed an iteration of container data scrubber in 1 minutes. Number of iterations (since the data-node restart) : 16, Number of containers scanned in this iteration : 3, Number of unhealthy containers found in this iteration : 0 ... Corruption detected in container: [2] Exception: [Inconsistent read for chunk=102914295727980545_chunk_1 len=5023516 expected checksum [0, 0, 0, 0, 21, 105, -33, 7] actual checksum [0, 0, 0, 0, -103, -121, 23, -96] for block conID: 2 locID: 102914295727980545 bcsId: 9] Completed an iteration of container data scrubber in 1 minutes. Number of iterations (since the data-node restart) : 19, Number of containers scanned in this iteration : 3, Number of unhealthy containers found in this iteration : 1 ```
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
