[ 
https://issues.apache.org/jira/browse/HADOOP-15593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16549472#comment-16549472
 ] 

Eric Yang commented on HADOOP-15593:
------------------------------------

[~gabor.bota] I know you are trying to retain existing behavior, but I think 
there are bugs in existing logic.  The calculation of nextRefresh is based on:

{code}
nextRefresh = Math.max(getRefreshTime(tgt),
              now + kerberosMinSecondsBeforeRelogin);
{code}

Most of the time nextRefresh = getRefreshTime(tgt).  If it is renewing exactly 
on refreshTime, and there are parallel operations using expired ticket.  There 
is a time gap that some operations might not perform until the next tgt is 
obtained.  Ideally, we want to keep service uninterrupted, therefore 
getNextTgtRenewalTime  supposed to calculate the time a few minutes before 
Kerberos tgt expired to determine the nextRefresh time.  It looks like we are 
not using getNextTgtRenewalTime method to calculate nextRefresh instead opt-in 
to use ticket expiration time as base line for nextRefresh.  I think patch 2 
approach can create time gap then strain on KDC server when ticket can not be 
renewed.  It would be better to calculate nextRefresh based on 
getNextTgtRenewalTime.

> UserGroupInformation TGT renewer throws NPE
> -------------------------------------------
>
>                 Key: HADOOP-15593
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15593
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: security
>    Affects Versions: 3.0.0
>            Reporter: Wei-Chiu Chuang
>            Assignee: Gabor Bota
>            Priority: Critical
>         Attachments: HADOOP-15593.001.patch, HADOOP-15593.002.patch
>
>
> Found the following NPE thrown in UGI tgt renewer. The NPE was thrown within 
> an exception handler so the original exception was hidden, though it's likely 
> caused by expired tgt.
> {noformat}
> 18/07/02 10:30:57 ERROR util.SparkUncaughtExceptionHandler: Uncaught 
> exception in thread Thread[TGT Renewer for [email protected],5,main]
> java.lang.NullPointerException
>         at 
> javax.security.auth.kerberos.KerberosTicket.getEndTime(KerberosTicket.java:482)
>         at 
> org.apache.hadoop.security.UserGroupInformation$1.run(UserGroupInformation.java:894)
>         at java.lang.Thread.run(Thread.java:748){noformat}
> Suspect it's related to [https://bugs.openjdk.java.net/browse/JDK-8154889].
> The relevant code was added in HADOOP-13590. File this jira to handle the 
> exception better.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to