[GitHub] [hadoop] AngersZhuuuu opened a new pull request #756: [HDFS-14437]Fix BUG mentionted in HDFS-14437

GitBox Fri, 19 Apr 2019 08:04:21 -0700

AngersZhuuuu opened a new pull request #756: [HDFS-14437]Fix BUG mentionted in 
HDFS-14437
URL: https://github.com/apache/hadoop/pull/756
 
 
   For the bug of EditLog rolling mentioned in 
   https://issues.apache.org/jira/browse/HDFS-10943
   
   I have tell the root cause of it in jira's comment.
   https://issues.apache.org/jira/browse/HDFS-14437
   
   In the code of #logSync() this #wait
   
   ```
   while (mytxid > synctxid && isSyncRunning) {
     try {
       wait(1000);
     } catch (InterruptedException ie) {
     }
   }
   ```
   when #endCurrentLogSegment call  #logSync() if  #isSyncRunning == true and 
mytxid > synctxid,
   
   Current thread  will call #wait, other thread will run.
   
   if other thread can't run , #isSyncRunning will always be true.
   
   current thread can't run out of the while loop
   
   this will become a dead lock.
   
   If other thread get lock to run, They can do many things in 1000ms.
   
   Then  other thread call logSync will end the flush process.
   
   synctxid may be bigger than mytxid, then it will just return in the code :
   ```
   if (mytxid <= synctxid) {
               numTransactionsBatchedInSync++;
               if (metrics != null) {
                 // Metrics is non-null only when used inside name node
                 metrics.incrTransactionsBatchedInSync();
               }
               return;
             }
   ```
    When this time you close the JournalSet's OutPutStream, it will trigger the 
bug.
   
   What I change is to add a control of case of close, always when wait() stop  
or been notified by other thread(when other thread finish logSync()), I make 
mytxid to be the max transaction Id. 
   Then this bug will not happen.
   
   
    
   
   So , the lock control is not correct.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [hadoop] AngersZhuuuu opened a new pull request #756: [HDFS-14437]Fix BUG mentionted in HDFS-14437

Reply via email to