[GitHub] [hadoop] zhangshuyan0 opened a new pull request, #5938: HDFS-17151. EC: Fix wrong metadata in BlockInfoStriped after recovery.


zhangshuyan0 opened a new pull request, #5938:
URL: https://github.com/apache/hadoop/pull/5938

When the datanode completes a block recovery, it will call
`commitBlockSynchronization` method to notify NN the new locations of the
block. For a EC block group, NN determines the block index of each storage
based on its position in the parameter `newtargets`.

https://github.com/apache/hadoop/blob/b6edcb9a84ceac340c79cd692637b3e11c997cc5/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java#L4059-L4081
If the internal blocks written by the client don't have continuous indices,
the current datanode code might cause NN to record incorrect block metadata.
For simplicity, let's take RS (3,2) as an example. The timeline of the
problem is as follows:
1. The client plans to write internal blocks with indices [0,1,2,3,4] to
datanode [dn0, dn1, dn2, dn3, dn4] respectively. But dn1 is unable to connect,
so the client only writes data to the remaining 4 datanodes;
2. Client crashes;
3. NN fails over;
4. Now the content of `uc. getExpectedStorageLocations()` in new ANN
completely depends on block reports, and now it is <dn0, dn2, dn3, dn4>;
5. When the lease expires hard limit, NN issues a block recovery command;
6. Datanode that receives the recovery command fills `DatanodeID [] newLocs`
with [dn0, null, dn2, dn3, dn4];

https://github.com/apache/hadoop/blob/b6edcb9a84ceac340c79cd692637b3e11c997cc5/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockRecoveryWorker.java#L471-L480
8. The serialization process filters out null values, so the parameters
passed to NN become [dn0, dn2, dn3, dn4];

https://github.com/apache/hadoop/blob/b6edcb9a84ceac340c79cd692637b3e11c997cc5/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolClientSideTranslatorPB.java#L322-L328
10. NN mistakenly believes that dn2 stores an internal block with index 1,
dn3 stores an internal block with index 2, and so on.

https://github.com/apache/hadoop/blob/b6edcb9a84ceac340c79cd692637b3e11c997cc5/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java#L4068-L4080

The above timeline is just an example, and there are other situations that
may result in the same error, such as an update pipeline occurs on the client
side. We should fix this bug.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to