Hi Andrew, Okay!
I test your patch. And I inform you of a result. Many thanks! Hideo Yamauchi. ----- Original Message ----- > From: Andrew Beekhof <[email protected]> > To: [email protected]; The Pacemaker cluster resource manager > <[email protected]> > Cc: > Date: 2014/10/10, Fri 10:47 > Subject: Re: [Pacemaker] [Problem]When Pacemaker uses a new version of glib, > g_source_remove fails. > > Perfect! > > Can you try this: > > diff --git a/lib/services/services.c b/lib/services/services.c > index 8590b56..cb0f0ae 100644 > --- a/lib/services/services.c > +++ b/lib/services/services.c > @@ -417,6 +417,7 @@ services_action_kick(const char *name, const char > *action, > int interval /* ms */ > free(id); > > if (op == NULL) { > + op->opaque->repeat_timer = 0; > return FALSE; > } > > @@ -425,6 +426,7 @@ services_action_kick(const char *name, const char > *action, > int interval /* ms */ > } else { > if (op->opaque->repeat_timer) { > g_source_remove(op->opaque->repeat_timer); > + op->opaque->repeat_timer = 0; > } > recurring_action_timer(op); > return TRUE; > @@ -459,6 +461,7 @@ handle_duplicate_recurring(svc_action_t * op, void > (*action_callback) (svc_actio > if (dup->pid != 0) { > if (op->opaque->repeat_timer) { > g_source_remove(op->opaque->repeat_timer); > + op->opaque->repeat_timer = 0; > } > recurring_action_timer(dup); > } > > > On 10 Oct 2014, at 12:16 pm, [email protected] wrote: > >> Hi Andrew, >> >> Setting of gdb of the Ubuntu environment does not yet go well and I touch > lrmd and cannot acquire trace. >> Please wait for this a little more. >> >> >> But.. I let lrmd terminate abnormally when g_source_remove() of > cancel_recurring_action() returned FALSE. >> ----- >> gboolean >> cancel_recurring_action(svc_action_t * op) >> { >> crm_info("Cancelling operation %s", op->id); >> >> if (recurring_actions) { >> g_hash_table_remove(recurring_actions, op->id); >> } >> >> if (op->opaque->repeat_timer) { >> if (g_source_remove(op->opaque->repeat_timer) == FALSE) { >> abort(); >> } >> (snip) >> -------core---- >> #0 0x00007f30aa60ff79 in __GI_raise (sig=sig@entry=6) at > ../nptl/sysdeps/unix/sysv/linux/raise.c:56 >> >> 56 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory. >> (gdb) where >> #0 0x00007f30aa60ff79 in __GI_raise (sig=sig@entry=6) at > ../nptl/sysdeps/unix/sysv/linux/raise.c:56 >> #1 0x00007f30aa613388 in __GI_abort () at abort.c:89 >> #2 0x00007f30aadcde77 in crm_abort (file=file@entry=0x7f30aae0152b > "logging.c", >> function=function@entry=0x7f30aae028c0 <__FUNCTION__.23262> > "crm_glib_handler", line=line@entry=73, >> assert_condition=assert_condition@entry=0x19d2ad0 "Source ID 63 > was not found when attempting to remove it", do_core=do_core@entry=1, >> do_fork=<optimized out>, do_fork@entry=1) at utils.c:1195 >> #3 0x00007f30aadf5ca7 in crm_glib_handler (log_domain=0x7f30aa35eb6e > "GLib", flags=<optimized out>, >> message=0x19d2ad0 "Source ID 63 was not found when attempting to > remove it", user_data=<optimized out>) at logging.c:73 >> #4 0x00007f30aa320ae1 in g_logv () from > /lib/x86_64-linux-gnu/libglib-2.0.so.0 >> #5 0x00007f30aa320d72 in g_log () from > /lib/x86_64-linux-gnu/libglib-2.0.so.0 >> #6 0x00007f30aa318c5c in g_source_remove () from > /lib/x86_64-linux-gnu/libglib-2.0.so.0 >> #7 0x00007f30aabb2b55 in cancel_recurring_action (op=op@entry=0x19caa90) > at services.c:363 >> #8 0x00007f30aabb2bee in services_action_cancel (name=name@entry=0x19d0530 > "dummy3", action=<optimized out>, interval=interval@entry=10000) >> at services.c:385 >> #9 0x000000000040405a in cancel_op (rsc_id=rsc_id@entry=0x19d0530 > "dummy3", action=action@entry=0x19cec10 "monitor", > interval=10000) >> at lrmd.c:1404 >> #10 0x000000000040614f in process_lrmd_rsc_cancel (client=0x19c8290, id=74, > request=0x19ca8a0) at lrmd.c:1468 >> #11 process_lrmd_message (client=client@entry=0x19c8290, id=74, > request=request@entry=0x19ca8a0) at lrmd.c:1507 >> #12 0x0000000000402bac in lrmd_ipc_dispatch (c=0x19c79c0, > data=<optimized out>, size=361) at main.c:148 >> #13 0x00007f30aa07b4d9 in qb_ipcs_dispatch_connection_request () from > /usr/lib/libqb.so.0 >> #14 0x00007f30aadf209d in gio_read_socket (gio=<optimized out>, > condition=G_IO_IN, data=0x19c68a8) at mainloop.c:437 >> #15 0x00007f30aa319ce5 in g_main_context_dispatch () from > /lib/x86_64-linux-gnu/libglib-2.0.so.0 >> ---Type <return> to continue, or q <return> to quit--- >> #16 0x00007f30aa31a048 in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0 >> #17 0x00007f30aa31a30a in g_main_loop_run () from > /lib/x86_64-linux-gnu/libglib-2.0.so.0 >> #18 0x0000000000402774 in main (argc=<optimized out>, > argv=0x7fffcdd90b88) at main.c:344 >> --------- >> >> Best Regards, >> Hideo Yamauchi. >> >> >> >> ----- Original Message ----- >>> From: "[email protected]" > <[email protected]> >>> To: Andrew Beekhof <[email protected]> >>> Cc: The Pacemaker cluster resource manager > <[email protected]> >>> Date: 2014/10/7, Tue 11:15 >>> Subject: Re: [Pacemaker] [Problem]When Pacemaker uses a new version of > glib, g_source_remove fails. >>> >>> Hi Andrew, >>> >>>> Not quite. Returning FALSE from the callback also removes the > source from >>> glib. >>>> So your test case effectively removes t1 twice: once implicitly by >>> returning >>>> FALSE in timer_func1() and then again explicitly in timer_func3() >>> >>> >>> Your opinion is right. >>> >>> >>> If Pacemaker repeats and does not remove the resources which timer > concluded in >>> FALSE, glib does not return the error. >>> >>> >>> Many Thanks, >>> Hideo Yamauchi. >>> >>> >>> ----- Original Message ----- >>>> From: Andrew Beekhof <[email protected]> >>>> To: [email protected] >>>> Cc: The Pacemaker cluster resource manager >>> <[email protected]> >>>> Date: 2014/10/7, Tue 11:06 >>>> Subject: Re: [Pacemaker] [Problem]When Pacemaker uses a new version > of >>> glib, g_source_remove fails. >>>> >>>> >>>> On 7 Oct 2014, at 1:03 pm, [email protected] wrote: >>>> >>>>> Hi Andrew, >>>>> >>>>>>> These problems seem to be due to a correction of next > glib >>> somehow >>>> or >>>>>> other. >>>>>>> * >>>>>> >>>> >>> > https://github.com/GNOME/glib/commit/393503ba5bdc7c09cd46b716aaf3d2c63a6c7f9c >>>>>> >>>>>> The glib behaviour on unbuntu seems reasonable, removing > a source >>>> multiple times >>>>>> IS a valid error. >>>>>> I need the stack trace to know where/how this situation > can occur >>> in >>>> pacemaker. >>>>> >>>>> >>>>> Pacemaker does not remove resources several times as far as I > >>> confirmed it. >>>>> In Ubuntu(glib2.40), an error occurs just to remove resources > first. >>>> >>>> Not quite. Returning FALSE from the callback also removes the > source from >>> glib. >>>> So your test case effectively removes t1 twice: once implicitly by >>> returning >>>> FALSE in timer_func1() and then again explicitly in timer_func3() >>>> >>>>> >>>>> Confirmation and the deletion of resources seem to be > necessary not to >>> >>>> produce an error in Ubuntu. >>>>> And this works well in glib of RHEL6.x.(and RHEL7.0) >>>>> >>>>> if (g_main_context_find_source_by_id (NULL, t1) != > NULL) { >>>>> g_source_remove(t1); >>>>> } >>>>> >>>>> I send it to you after acquiring stack trace. >>>>> >>>>> Many Thanks! >>>>> Hideo Yamauchi. >>>>> >>>>> ----- Original Message ----- >>>>>> From: Andrew Beekhof <[email protected]> >>>>>> To: [email protected]; The Pacemaker cluster > resource >>> manager >>>> <[email protected]> >>>>>> Cc: >>>>>> Date: 2014/10/7, Tue 09:44 >>>>>> Subject: Re: [Pacemaker] [Problem]When Pacemaker uses a > new >>> version of >>>> glib, g_source_remove fails. >>>>>> >>>>>> >>>>>> On 6 Oct 2014, at 4:09 pm, [email protected] > wrote: >>>>>> >>>>>>> Hi All, >>>>>>> >>>>>>> When I move the next sample in > RHEL6.5(glib2-2.22.5-7.el6) and >>> >>>>>> Ubuntu14.04(libglib2.0-0:amd64 2.40.0-2), movement is > different. >>>>>>> >>>>>>> * Sample : test2.c >>>>>>> {{{ >>>>>>> #include <stdio.h> >>>>>>> #include <stdlib.h> >>>>>>> #include <glib.h> >>>>>>> #include <sys/times.h> >>>>>>> guint t1, t2, t3; >>>>>>> gboolean timer_func2(gpointer data){ >>>>>>> printf("TIMER EXPIRE!2\n"); >>>>>>> fflush(stdout); >>>>>>> return FALSE; >>>>>>> } >>>>>>> gboolean timer_func1(gpointer data){ >>>>>>> clock_t ret; >>>>>>> struct tms buff; >>>>>>> >>>>>>> ret = times(&buff); >>>>>>> printf("TIMER EXPIRE!1 %d\n", >>> (int)ret); >>>>>>> fflush(stdout); >>>>>>> return FALSE; >>>>>>> } >>>>>>> gboolean timer_func3(gpointer data){ >>>>>>> printf("TIMER EXPIRE 3!\n"); >>>>>>> fflush(stdout); >>>>>>> printf("remove timer1!\n"); >>>>>>> >>>>>>> fflush(stdout); >>>>>>> g_source_remove(t1); >>>>>>> printf("remove timer2!\n"); >>>>>>> fflush(stdout); >>>>>>> g_source_remove(t2); >>>>>>> printf("remove timer3!\n"); >>>>>>> fflush(stdout); >>>>>>> g_source_remove(t3); >>>>>>> return FALSE; >>>>>>> } >>>>>>> int main(int argc, char** argv){ >>>>>>> GMainLoop *m; >>>>>>> clock_t ret; >>>>>>> struct tms buff; >>>>>>> gint64 t; >>>>>>> m = g_main_new(FALSE); >>>>>>> t1 = g_timeout_add(1000, timer_func1, NULL); >>>>>>> t2 = g_timeout_add(60000, timer_func2, NULL); >>>>>>> t3 = g_timeout_add(5000, timer_func3, NULL); >>>>>>> ret = times(&buff); >>>>>>> printf("START! %d\n", > (int)ret); >>>>>>> g_main_run(m); >>>>>>> } >>>>>>> >>>>>>> }}} >>>>>>> * Result >>>>>>> ---- RHEL6.5(glib2-2.22.5-7.el6) ---- >>>>>>> [root@snmp1 ~]# ./test2 >>>>>>> START! 429576012 >>>>>>> TIMER EXPIRE!1 429576112 >>>>>>> TIMER EXPIRE 3! >>>>>>> remove timer1! >>>>>>> remove timer2! >>>>>>> remove timer3! >>>>>>> >>>>>>> ---- Ubuntu14.04(libglib2.0-0:amd64 2.40.0-2) ---- >>>>>>> root@a1be102:~# ./test2 >>>>>>> START! 1718163089 >>>>>>> TIMER EXPIRE!1 1718163189 >>>>>>> TIMER EXPIRE 3! >>>>>>> remove timer1! >>>>>>> >>>>>>> (process:1410): GLib-CRITICAL **: Source ID 1 was not > found >>> when >>>> attempting >>>>>> to remove it >>>>>>> remove timer2! >>>>>>> remove timer3! >>>>>>> >>>>>>> >>>>>>> These problems seem to be due to a correction of next > glib >>> somehow >>>> or >>>>>> other. >>>>>>> * >>>>>> >>>> >>> > https://github.com/GNOME/glib/commit/393503ba5bdc7c09cd46b716aaf3d2c63a6c7f9c >>>>>> >>>>>> The glib behaviour on unbuntu seems reasonable, removing > a source >>>> multiple times >>>>>> IS a valid error. >>>>>> I need the stack trace to know where/how this situation > can occur >>> in >>>> pacemaker. >>>>>> >>>>>>> >>>>>>> In g_source_remove() until before change, the > deletion of the >>> timer >>>> which >>>>>> practice completed is possible, but g_source_remove() > after the >>> change >>>> causes an >>>>>> error. >>>>>>> >>>>>>> Under this influence, we get the following crit error > in the >>>> environment of >>>>>> Pacemaker using a new version of glib. >>>>>>> >>>>>>> lrmd[1632]: error: crm_abort: crm_glib_handler: > Forked >>> child >>>> 1840 to >>>>>>> record non-fatal assert at logging.c:73 : Source ID > 51 was not >>> >>>> found when >>>>>>> attempting to remove it >>>>>>> lrmd[1632]: crit: crm_glib_handler: GLib: Source > ID 51 was >>> not >>>> found >>>>>>> when attempting to remove it >>>>>>> >>>>>>> It seems that some kind of coping is necessary in > Pacemaker >>> when I >>>> think >>>>>> about next. >>>>>>> * Distribution using a new version of glib including > Ubuntu. >>>>>>> * Version up of future glib of RHEL. >>>>>>> >>>>>>> A similar problem is reported in the ML. >>>>>>> * >>>> http://www.gossamer-threads.com/lists/linuxha/pacemaker/91333#91333 >>>>>>> * >>> http://www.gossamer-threads.com/lists/linuxha/pacemaker/92408 >>>>>>> >>>>>>> Best Regards, >>>>>>> Hideo Yamauchi. >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Pacemaker mailing list: [email protected] >>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>>>>>> >>>>>>> Project Home: http://www.clusterlabs.org >>>>>>> Getting started: >>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>>>> Bugs: http://bugs.clusterlabs.org >>>>>> >>>> >>> >>> _______________________________________________ >>> Pacemaker mailing list: [email protected] >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>> >>> Project Home: http://www.clusterlabs.org >>> Getting started: > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: http://bugs.clusterlabs.org >>> >> >> _______________________________________________ >> Pacemaker mailing list: [email protected] >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org > _______________________________________________ Pacemaker mailing list: [email protected] http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
