13.01.2014, 02:51, "Andrew Beekhof" <[email protected]>:
> On 10 Jan 2014, at 9:55 pm, Andrey Groshev <[email protected]> wrote:
>
>> 10.01.2014, 14:31, "Andrey Groshev" <[email protected]>:
>>> 10.01.2014, 14:01, "Andrew Beekhof" <[email protected]>:
>>>> On 10 Jan 2014, at 5:03 pm, Andrey Groshev <[email protected]> wrote:
>>>>> 10.01.2014, 05:29, "Andrew Beekhof" <[email protected]>:
>>>>>> On 9 Jan 2014, at 11:11 pm, Andrey Groshev <[email protected]> wrote:
>>>>>>> 08.01.2014, 06:22, "Andrew Beekhof" <[email protected]>:
>>>>>>>> On 29 Nov 2013, at 7:17 pm, Andrey Groshev <[email protected]>
>>>>>>>> wrote:
>>>>>>>>> Hi, ALL.
>>>>>>>>>
>>>>>>>>> I'm still trying to cope with the fact that after the fence -
>>>>>>>>> node hangs in "pending".
>>>>>>>> Please define "pending". Where did you see this?
>>>>>>> In crm_mon:
>>>>>>> ......
>>>>>>> Node dev-cluster2-node2 (172793105): pending
>>>>>>> ......
>>>>>>>
>>>>>>> The experiment was like this:
>>>>>>> Four nodes in cluster.
>>>>>>> On one of them kill corosync or pacemakerd (signal 4 or 6 oк 11).
>>>>>>> Thereafter, the remaining start it constantly reboot, under
>>>>>>> various pretexts, "softly whistling", "fly low", "not a cluster
>>>>>>> member!" ...
>>>>>>> Then in the log fell out "Too many failures ...."
>>>>>>> All this time in the status in crm_mon is "pending".
>>>>>>> Depending on the wind direction changed to "UNCLEAN"
>>>>>>> Much time has passed and I can not accurately describe the
>>>>>>> behavior...
>>>>>>>
>>>>>>> Now I am in the following state:
>>>>>>> I tried locate the problem. Came here with this.
>>>>>>> I set big value in property stonith-timeout="600s".
>>>>>>> And got the following behavior:
>>>>>>> 1. pkill -4 corosync
>>>>>>> 2. from node with DC call my fence agent "sshbykey"
>>>>>>> 3. It sends reboot victim and waits until she comes to life again.
>>>>>> Hmmm.... what version of pacemaker?
>>>>>> This sounds like a timing issue that we fixed a while back
>>>>> Was a version 1.1.11 from December 3.
>>>>> Now try full update and retest.
>>>> That should be recent enough. Can you create a crm_report the next time
>>>> you reproduce?
>>> Of course yes. Little delay.... :)
>>>
>>> ......
>>> cc1: warnings being treated as errors
>>> upstart.c: In function ‘upstart_job_property’:
>>> upstart.c:264: error: implicit declaration of function
>>> ‘g_variant_lookup_value’
>>> upstart.c:264: error: nested extern declaration of ‘g_variant_lookup_value’
>>> upstart.c:264: error: assignment makes pointer from integer without a cast
>>> gmake[2]: *** [libcrmservice_la-upstart.lo] Error 1
>>> gmake[2]: Leaving directory `/root/ha/pacemaker/lib/services'
>>> make[1]: *** [all-recursive] Error 1
>>> make[1]: Leaving directory `/root/ha/pacemaker/lib'
>>> make: *** [core] Error 1
>>>
>>> I'm trying to solve this a problem.
>> Do not get solved quickly...
>>
>>
>> https://developer.gnome.org/glib/2.28/glib-GVariant.html#g-variant-lookup-value
>> g_variant_lookup_value () Since 2.28
>>
>> # yum list installed glib2
>> Loaded plugins: fastestmirror, rhnplugin, security
>> This system is receiving updates from RHN Classic or Red Hat Satellite.
>> Loading mirror speeds from cached hostfile
>> Installed Packages
>> glib2.x86_64
>> 2.26.1-3.el6
>> installed
>>
>> # cat /etc/issue
>> CentOS release 6.5 (Final)
>> Kernel \r on an \m
>
> Can you try this patch?
> Upstart jobs wont work, but the code will compile
>
> diff --git a/lib/services/upstart.c b/lib/services/upstart.c
> index 831e7cf..195c3a4 100644
> --- a/lib/services/upstart.c
> +++ b/lib/services/upstart.c
> @@ -231,12 +231,21 @@ upstart_job_exists(const char *name)
> static char *
> upstart_job_property(const char *obj, const gchar * iface, const char *name)
> {
> + char *output = NULL;
> +
> +#if !GLIB_CHECK_VERSION(2,28,0)
> + static bool err = TRUE;
> +
> + if(err) {
> + crm_err("This version of glib is too old to support upstart jobs");
> + err = FALSE;
> + }
> +#else
> GError *error = NULL;
> GDBusProxy *proxy;
> GVariant *asv = NULL;
> GVariant *value = NULL;
> GVariant *_ret = NULL;
> - char *output = NULL;
>
> crm_info("Calling GetAll on %s", obj);
> proxy = get_proxy(obj, BUS_PROPERTY_IFACE);
> @@ -272,6 +281,7 @@ upstart_job_property(const char *obj, const gchar *
> iface, const char *name)
>
> g_object_unref(proxy);
> g_variant_unref(_ret);
> +#endif
> return output;
> }
>
Ok :) I patch source.
Type "make rc" - the same error.
Make new copy via "fetch" - the same error.
It seems that if not exist ClusterLabs-pacemaker-Pacemaker-1.1.11-rc3.tar.gz,
then download it.
Otherwise use exist archive.
Cutted log .......
# make rc
make TAG=Pacemaker-1.1.11-rc3 rpm
make[1]: Entering directory `/root/ha/pacemaker'
rm -f pacemaker-dirty.tar.* pacemaker-tip.tar.* pacemaker-HEAD.tar.*
if [ ! -f ClusterLabs-pacemaker-Pacemaker-1.1.11-rc3.tar.gz ]; then
\
rm -f pacemaker.tar.*;
\
if [ Pacemaker-1.1.11-rc3 = dirty ]; then
\
git commit -m "DO-NOT-PUSH" -a;
\
git archive
--prefix=ClusterLabs-pacemaker-Pacemaker-1.1.11-rc3/ HEAD | gzip >
ClusterLabs-pacemaker-Pacemaker-1.1.11-rc3.tar.gz; \
git reset --mixed HEAD^;
\
else
\
git archive
--prefix=ClusterLabs-pacemaker-Pacemaker-1.1.11-rc3/ Pacemaker-1.1.11-rc3 |
gzip > ClusterLabs-pacemaker-Pacemaker-1.1.11-rc3.tar.gz; \
fi;
\
echo `date`: Rebuilt
ClusterLabs-pacemaker-Pacemaker-1.1.11-rc3.tar.gz;
\
else
\
echo `date`: Using existing tarball:
ClusterLabs-pacemaker-Pacemaker-1.1.11-rc3.tar.gz; \
fi
Mon Jan 13 13:23:21 MSK 2014: Using existing tarball:
ClusterLabs-pacemaker-Pacemaker-1.1.11-rc3.tar.gz
.......
Well, "make rpm" - build rpms and I create cluster.
I spent the same tests and confirmed the behavior.
crm_reoprt log here - http://send2me.ru/crmrep.tar.bz2
>>>>>>> Once the script makes sure that the victim will rebooted and
>>>>>>> again available via ssh - it exit with 0.
>>>>>>> All command is logged both the victim and the killer - all right.
>>>>>>> 4. A little later, the status of the (victim) nodes in crm_mon
>>>>>>> changes to online.
>>>>>>> 5. BUT... not one resource don't start! Despite the fact that
>>>>>>> "crm_simalate -sL" shows the correct resource to start:
>>>>>>> * Start pingCheck:3 (dev-cluster2-node2)
>>>>>>> 6. In this state, we spend the next 600 seconds.
>>>>>>> After completing this timeout causes another node (not DC)
>>>>>>> decides to kill again our victim.
>>>>>>> All command again is logged both the victim and the killer - All
>>>>>>> documented :)
>>>>>>> 7. NOW all resource started in right sequence.
>>>>>>>
>>>>>>> I almost happy, but I do not like: two reboots and 10 minutes of
>>>>>>> waiting ;)
>>>>>>> And if something happens on another node, this the behavior is
>>>>>>> superimposed on old and not any resources not start until the last node
>>>>>>> will not reload twice.
>>>>>>>
>>>>>>> I tried understood this behavior.
>>>>>>> As I understand it:
>>>>>>> 1. Ultimately, in ./lib/fencing/st_client.c call
>>>>>>> internal_stonith_action_execute().
>>>>>>> 2. It make fork and pipe from tham.
>>>>>>> 3. Async call mainloop_child_add with callback to
>>>>>>> stonith_action_async_done.
>>>>>>> 4. Add timeout g_timeout_add to TERM and KILL signals.
>>>>>>>
>>>>>>> If all right must - call stonith_action_async_done, remove timeout.
>>>>>>> For some reason this does not happen. I sit and think ....
>>>>>>>>> At this time, there are constant re-election.
>>>>>>>>> Also, I noticed the difference when you start pacemaker.
>>>>>>>>> At normal startup:
>>>>>>>>> * corosync
>>>>>>>>> * pacemakerd
>>>>>>>>> * attrd
>>>>>>>>> * pengine
>>>>>>>>> * lrmd
>>>>>>>>> * crmd
>>>>>>>>> * cib
>>>>>>>>>
>>>>>>>>> When hangs start:
>>>>>>>>> * corosync
>>>>>>>>> * pacemakerd
>>>>>>>>> * attrd
>>>>>>>>> * pengine
>>>>>>>>> * crmd
>>>>>>>>> * lrmd
>>>>>>>>> * cib.
>>>>>>>> Are you referring to the order of the daemons here?
>>>>>>>> The cib should not be at the bottom in either case.
>>>>>>>>> Who knows who runs lrmd?
>>>>>>>> Pacemakerd.
>>>>>>>>> _______________________________________________
>>>>>>>>> Pacemaker mailing list: [email protected]
>>>>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>>>>
>>>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>>>> Getting started:
>>>>>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>>> ,
>>>>>>>> _______________________________________________
>>>>>>>> Pacemaker mailing list: [email protected]
>>>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>>>
>>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>>> Getting started:
>>>>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>> _______________________________________________
>>>>>>> Pacemaker mailing list: [email protected]
>>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>>
>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>> Getting started:
>>>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>> ,
>>>>>> _______________________________________________
>>>>>> Pacemaker mailing list: [email protected]
>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>
>>>>>> Project Home: http://www.clusterlabs.org
>>>>>> Getting started:
>>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>> _______________________________________________
>>>>> Pacemaker mailing list: [email protected]
>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>
>>>>> Project Home: http://www.clusterlabs.org
>>>>> Getting started:
>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>> Bugs: http://bugs.clusterlabs.org
>>>> ,
>>>> _______________________________________________
>>>> Pacemaker mailing list: [email protected]
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>> _______________________________________________
>>> Pacemaker mailing list: [email protected]
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>> _______________________________________________
>> Pacemaker mailing list: [email protected]
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
> ,
> _______________________________________________
> Pacemaker mailing list: [email protected]
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
_______________________________________________
Pacemaker mailing list: [email protected]
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org