[ https://issues.apache.org/jira/browse/GEODE-8535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17201785#comment-17201785 ]
Mario Salazar de Torres edited comment on GEODE-8535 at 9/24/20, 11:53 PM: --------------------------------------------------------------------------- My hypothesis for this case is that this problem is caused due to a time precision missalignment. My evidences to support this are in the coredump.log file and are the following: * The entry causing the crash is which key is *entry-505993* as can be seen in notifications-no-massif.log:34119 * Previous mentions of this key are in notifications-no-massif.log:24867-24872: {code:java} [debug 2020/09/24 21:47:40.779570 CEST DESKTOP-3SQUK3P:746832 140626765563648] Entered entry expiry task handler for tombstone of key [entry-505993]: 513315836275097ns,513315826409197ns,10ms,-134100ns [debug 2020/09/24 21:47:40.779623 CEST DESKTOP-3SQUK3P:746832 140626765563648] Resetting expiry task 134100ns later for key [entry-505993] [debug 2020/09/24 21:47:40.779661 CEST DESKTOP-3SQUK3P:746832 140626765563648] Entered entry expiry task handler for tombstone of key [entry-505993]: 513315836390697ns,513315826409197ns,10ms,-18500ns [debug 2020/09/24 21:47:40.779667 CEST DESKTOP-3SQUK3P:746832 140626765563648] Resetting expiry task 18500ns later for key [entry-505993] [debug 2020/09/24 21:47:40.779676 CEST DESKTOP-3SQUK3P:746832 140626765563648] Entered entry expiry task handler for tombstone of key [entry-505993]: 513315836408997ns,513315826409197ns,10ms,-200ns [debug 2020/09/24 21:47:40.779681 CEST DESKTOP-3SQUK3P:746832 140626765563648] Resetting expiry task 200ns later for key [entry-505993]{code} * As can be seen expiry task handler is woken up 3 times and in the last time, whenever only 200ns remain to execute the expiry task, there is no sign of the task being woken up again. * Looking into ExpiryTaskManager::resetTask it uses an ACE_Time_Value variable which minimum precision is microseconds. *Therefore* my guess is that given the expiry time is below 200ns, whenever calling reset, the task is considered done and the handler is destroyed. was (Author: gaussianrecurrence): My hypothesis for this case is that this problem is caused due to a time precision missalignment. My evidences to support this are in the coredump.log file and are the following: * The entry causing the crash is which key is *entry-505993* as can be seen in notifications-no-massif.log:34119 * Previous mentions of this key are in notifications-no-massif.log:24867-24872: {code:java} [debug 2020/09/24 21:47:40.779570 CEST DESKTOP-3SQUK3P:746832 140626765563648] Entered entry expiry task handler for tombstone of key [entry-505993]: 513315836275097ns,513315826409197ns,10ms,-134100ns [debug 2020/09/24 21:47:40.779623 CEST DESKTOP-3SQUK3P:746832 140626765563648] Resetting expiry task 134100ns later for key [entry-505993] [debug 2020/09/24 21:47:40.779661 CEST DESKTOP-3SQUK3P:746832 140626765563648] Entered entry expiry task handler for tombstone of key [entry-505993]: 513315836390697ns,513315826409197ns,10ms,-18500ns [debug 2020/09/24 21:47:40.779667 CEST DESKTOP-3SQUK3P:746832 140626765563648] Resetting expiry task 18500ns later for key [entry-505993] [debug 2020/09/24 21:47:40.779676 CEST DESKTOP-3SQUK3P:746832 140626765563648] Entered entry expiry task handler for tombstone of key [entry-505993]: 513315836408997ns,513315826409197ns,10ms,-200ns [debug 2020/09/24 21:47:40.779681 CEST DESKTOP-3SQUK3P:746832 140626765563648] Resetting expiry task 200ns later for key [entry-505993]{code} * As can be seen expiry task handler is woken up 3 times and in the last time, whenever only 200ns remain to execute the expiry task, there is no sign of the task being woken up again. * Looking into ExpiryTaskManager::resetTask it uses an ACE_Time_Value variable which minimum precision is microseconds. *Therefore* my guess is that given the expiry time is below 200ns, whenever calling reset, the task is considered done and the handler is destoyed. > Coredump while putting an entry to a LocalRegion > ------------------------------------------------ > > Key: GEODE-8535 > URL: https://issues.apache.org/jira/browse/GEODE-8535 > Project: Geode > Issue Type: Bug > Components: native client > Affects Versions: 1.13.0 > Reporter: Mario Salazar de Torres > Priority: Major > Attachments: coredump.log, notifications-no-massif.log > > > The scenario is the following: > *GIVEN* concurrency-checks-enabled=true (as default) for the region in which > the put operation is happening. > *GIVEN* tombstone-timeout=10ms > *WHENEVER* a huge load (hundreds per second) of LOCAL_CREATE, LOCAL_DESTROY > notifications are received in the client for the same region and consecutive > keys, as below example shows: > {code:java} > t_0: LOCAL_CREATE for key entry-1 > t_1: LOCAL_DESTROY for key entry-1 > t_2: LOCAL_CREATE for key entry-2 > t_3: LOCAL_DESTROY for key entry-2 > · > · > · > t_(2*(n-1)): LOCAL_CREATE for key entry-n > t_(2*n-1): LOCAL_DESTROY for key entry-n{code} > *THEN* the application crashes, in many different places, but as for the case > reported here, whenever trying access the virtual destructor pointing of the > ExpiryHandlerTask, which turns out to be nullptr. > > Find segmentation report attached as *coredump.log* and also, geode-native > debug log attached as *notifications-no-massif.log* > -- This message was sent by Atlassian Jira (v8.3.4#803005)