[ 
https://issues.apache.org/jira/browse/GEODE-9180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dick Cavender closed GEODE-9180.
--------------------------------

> Heartbeats Are Interrupted Inexplicably
> ---------------------------------------
>
>                 Key: GEODE-9180
>                 URL: https://issues.apache.org/jira/browse/GEODE-9180
>             Project: Geode
>          Issue Type: Bug
>          Components: membership
>            Reporter: Bill Burcham
>            Assignee: Bill Burcham
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.12.4, 1.13.4, 1.14.0, 1.15.0
>
>
> Sometimes we see a member force-disconnected and we see a preceding gap in 
> the regular sequence of heartbeats generated by the member, but we can't 
> explain why there was a gap. The explanation we are searching for is usually 
> CPU saturation. We look for secondary evidence such as gaps in the regular 
> sequence of statistics e.g. StatSampler sampleCount. When we can't find such 
> secondary evidence, we can't, in good conscience, rule out bugs in the 
> heartbeat generation logic itself.
> The heartbeat generation logic consists mainly of a thread that loops 
> forever. Each time through the loop it sleeps for member-timeout / 
> logical-interval. By default that's 5s / 2 = 2.5s. When it wakes up it sends 
> unreliable UDP unicast messages to the coordinator and the two 
> non-coordinator members to its "left" (earlier) in the view. If that 
> heartbeat generation thread oversleeps or doesn't get adequate time slices 
> when it's awake then heartbeats will be delayed. There will be gaps in the 
> regular sequence.
> When this ticket is complete, a warning-level message will be logged if the 
> heartbeat generation thread (see {{GMSHealthMonitor.startHeartbeatThread()}}) 
> oversleeps by more than the sleep interval (member-timeout / 
> logical-interval), i.e. if it is asleep for more than 2 * (member-timeout / 
> logical-interval), the warning will be logged.
> h3. See Also
> {{HostStatSampler}} generates messages like this:
> {quote}Statistics sampling thread detected a wakeup delay of 14318 ms, 
> indicating a possible resource issue. Check the GC, memory, and CPU 
> statistics.{quote}
> (from {{checkElapsedSleepTime}})
> The current ticket is needed because the actual thread of interest for 
> heartbeat generation is the heartbeat-generation thread and sometimes it 
> oversleeps when the stat sampler thread does not.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to