Recently, I discovered a situation while using hadoop.
When I use C API to send a request to the HDFS federation mode router node RPC
port, such as writing a file, after the client sends the completed request, the
Hadoop side needs a 20-minute delay before the file has a byte size, and the
file cannot be processed during the delay. operate.
After the client side is running, the general process of the hadoop side log is
as follows:
1. The namenode receives the client's request, and FSEditLog prints the log.
2.blockmanager.BlockPlacementPolicy: Prompts that there are not enough copies
to choose from. Reason: {NO_REQUIRED_STORAGE_TYPE=1}
3.StateChange: allocate block
4.StateChange: Obtain lease for hadoop directory files
5.ipc.Server: The method of checking the lease threw an exception
LeaseExpiredExcepion: INode is not a regular file: /
6.(Start waiting)
7. After 20 minutes, the hard limit is reached. Force closing of lease.
8. Trigger Lease recovery
9. Then the execution can be successful.
I also suspected it was a client problem. But I did several sets of tests (all
using C API to send write requests to Hadoop, abbreviated below)
Version 3.3.1, router, rpc port. --> There is a 20-minute delay
Version 3.3.1, namenode, rpc port. --> No problem
Version 3.3.1, router, http port. --> No problem
Version 3.3.1, namenode, http port. --> No problem
Version 3.1.1, router, rpc port. --> No problem
Version 3.1.1, namenode, rpc port. --> No problem
Version 3.1.1, router, rpc port. --> No problem
Version 3.1.1, namenode, rpc port. --> No problem
Here are my guesses:
From the hadoop log, it is speculated that version 3.3.1, router, and rpc port
did not obtain the lease at the beginning, so the lease could not be closed
normally, and the lease could not be exited until the hard limit was triggered.
But I can't explain why the same client does not have this phenomenon in
version 3.1.1. I suspect that this phenomenon is caused by the incompatibility
between version changes and certain parts of libhdfs3.so.
If anyone finds a similar situation, I'd love a reply to point me in the
direction of this issue.
| |
王继泽
|
|
[email protected]
|