Re: [ceph-users] active+degraded even after following best practice

Anil Dhingra Sat, 07 Jun 2014 21:46:27 -0700

Hi Ceph

after moving to legay tunables cluster back to active+clean but if I revert
it back to optimal or firefly it move to " active+remapped"  but Health is
"OK".. but with legacy settings "HEALTH_WARN crush map has legacy
tunables"   ..  ?  any one have any idea why .. I cleared the warning


ceph -- tell mon.\* injectargs --no-mon-warn-on-legacy-crush-tunables

But is it best practice ? if not how to achieve optimal ..

#
*ceph osd crush tunables legacy*adjusted tunables profile to legacy
[root@ceph-node1 my-cluster]#
*ceph -s*    cluster fbf07780-f7bf-4d92-a144-a931ef5cd4a9
     health HEALTH_WARN crush map has legacy tunables
     monmap e3: 3 mons at {ceph-node1=
192.168.10.41:6789/0,ceph-node2=192.168.10.42:6789/0,ceph-node3=192.168.10.43:6789/0},
election epoch 14, quorum 0,1,2 ceph-node1,ceph-node2,ceph-node3
     mdsmap e9: 1/1/1 up {0=ceph-node1=up:active}
     osdmap e39: 5 osds: 5 up, 5 in
      pgmap v99: 192 pgs, 3 pools, 1884 bytes data, 20 objects
            167 MB used, 25377 MB / 25544 MB avail
                 192 active+clean
[root@ceph-node1 my-cluster]# *ceph osd crush tunables optimal*
OR
[root@ceph-node1 my-cluster]# *ceph osd crush tunables firefly*
[root@ceph-node1 my-cluster]# ceph -s
    cluster fbf07780-f7bf-4d92-a144-a931ef5cd4a9
     health HEALTH_OK
     monmap e3: 3 mons at {ceph-node1=
192.168.10.41:6789/0,ceph-node2=192.168.10.42:6789/0,ceph-node3=192.168.10.43:6789/0},
election epoch 14, quorum 0,1,2 ceph-node1,ceph-node2,ceph-node3
     mdsmap e9: 1/1/1 up {0=ceph-node1=up:active}
     osdmap e52: 6 osds: 6 up, 6 in
      pgmap v135: 192 pgs, 3 pools, 1884 bytes data, 20 objects
            214 MB used, 30439 MB / 30653 MB avail
                 192 active+remapped
[root@ceph-node1 my-cluster]# *ceph osd crush tunables legacy*
adjusted tunables profile to legacy
[root@ceph-node1 my-cluster]#
* ceph -s*    cluster fbf07780-f7bf-4d92-a144-a931ef5cd4a9
     health HEALTH_WARN crush map has legacy tunables
     monmap e3: 3 mons at {ceph-node1=
192.168.10.41:6789/0,ceph-node2=192.168.10.42:6789/0,ceph-node3=192.168.10.43:6789/0},
election epoch 14, quorum 0,1,2 ceph-node1,ceph-node2,ceph-node3
     mdsmap e9: 1/1/1 up {0=ceph-node1=up:active}
     osdmap e56: 6 osds: 6 up, 6 in
      pgmap v147: 192 pgs, 3 pools, 1884 bytes data, 20 objects
            214 MB used, 30438 MB / 30653 MB avail
                 192 active+clean


On Sun, Jun 8, 2014 at 10:58 AM, Anil Dhingra <[email protected]>
wrote:

> Hi ceph
>
> followed guide at
> http://ceph.com/docs/master/start/quick-start-preflight/#ceph-deploy-setup
> .. & used release rpm at rpm -Uvh
> http://ceph.com/rpm-firefly/rhel6/noarch/ceph-release-1-0.el6.noarch.rpm
> but no auto dir creation & udev rules .
>
> as suggested in guide I mentioned " osd pool default size = 2" in
> ceph.conf ..so will it still follow 3x ?
>
> As per suggestion added 1 more OSD  waited for few hrs but no luck after
> that & increased siz & min_size  but still same .. increase monitor to 3
> nodes but again same output with degraded % increased
>
> [root@ceph-node1 ~]# ceph osd tree
>
> # id    weight  type name       up/down reweight
> -1      0       root default
> -2      0               host ceph-node2
> 0       0                       osd.0   up      1
> -3      0               host ceph-node3
> 1       0                       osd.1   up      1
> -4      0               host ceph-node1
> 2       0                       osd.2   up      1
>
> [root@ceph-node1 ~]# ceph osd dump | grep 'replicated size'
>
> pool 0 'data' replicated size 2 min_size 1 crush_ruleset 0 object_hash
> rjenkins pg_num 64 pgp_num 64 last_change 1 owner 0 flags hashpspool
> crash_replay_interval 45 stripe_width 0
> pool 1 'metadata' replicated size 2 min_size 1 crush_ruleset 0 object_hash
> rjenkins pg_num 64 pgp_num 64 last_change 1 owner 0 flags hashpspool
> stripe_width 0
> pool 2 'rbd' replicated size 2 min_size 1 crush_ruleset 0 object_hash
> rjenkins pg_num 64 pgp_num 64 last_change 1 owner 0 flags hashpspool
> stripe_width 0
> [root@ceph-node1 ~]# ceph osd pool set data size 3
> set pool 0 size to 3
> [root@ceph-node1 ~]# ceph osd pool set metadata size 3
> set pool 1 size to 3
> [root@ceph-node1 ~]# ceph osd pool set rbd size 3
> set pool 2 size to 3
> [root@ceph-node1 ~]# ceph osd pool set data min_size 2
> set pool 0 min_size to 2
> [root@ceph-node1 ~]# ceph osd pool set rbd min_size 2
> set pool 2 min_size to 2
> [root@ceph-node1 ~]# ceph osd pool set metadata min_size 2
> set pool 1 min_size to 2
> [root@ceph-node1 ~]# ceph osd dump | grep 'replicated size'
> pool 0 'data' replicated size 3 min_size 2 crush_ruleset 0 object_hash
> rjenkins pg_num 64 pgp_num 64 last_change 28 owner 0 flags hashpspool
> crash_replay_interval 45 stripe_width 0
> pool 1 'metadata' replicated size 3 min_size 2 crush_ruleset 0 object_hash
> rjenkins pg_num 64 pgp_num 64 last_change 30 owner 0 flags hashpspool
> stripe_width 0
> pool 2 'rbd' replicated size 3 min_size 2 crush_ruleset 0 object_hash
> rjenkins pg_num 64 pgp_num 64 last_change 29 owner 0 flags hashpspool
> stripe_width 0
> [root@ceph-node1 ~]#
>
> [root@ceph-node1 ~]# ceph -w
>     cluster fbf07780-f7bf-4d92-a144-a931ef5cd4a9
>      health HEALTH_WARN 192 pgs degraded; 192 pgs stuck unclean; recovery
> 40/60 objects degraded (66.667%)
>      monmap e3: 3 mons at {ceph-node1=
> 192.168.10.41:6789/0,ceph-node2=192.168.10.42:6789/0,ceph-node3=192.168.10.43:6789/0},
> election epoch 14, quorum 0,1,2 ceph-node1,ceph-node2,ceph-node3
>      mdsmap e9: 1/1/1 up {0=ceph-node1=up:active}
>      osdmap e30: 3 osds: 3 up, 3 in
>       pgmap v72: 192 pgs, 3 pools, 1884 bytes data, 20 objects
>             100 MB used, 15226 MB / 15326 MB avail
>             40/60 objects degraded (66.667%)
>                  *192 active+degraded*
>
> All pools are using ruleset    * "crush_ruleset": 0,*
>
>  { "rule_id": 0,
>           "rule_name": "replicated_ruleset",
>           "ruleset": 0,
>           "type": 1,
>           "min_size": 1,
>           "max_size": 10,
>           "steps": [
>                 { "op": "take",
>                   "item": -1,
>                   "item_name": "default"},
>                 { "op": "chooseleaf_firstn",
>                   "num": 0,
>                   "type": "host"},
>                 { "op": "emit"}]}],
>
>
>
>
> On Sun, Jun 8, 2014 at 3:48 AM, Sage Weil <[email protected]> wrote:
>
>> On Sat, 7 Jun 2014, Anil Dhingra wrote:
>> > HI Guys
>> >
>> > Finally writing ..after loosing my patience to configure my cluster
>> multiple
>> > times but still not able to achieve active+clean .. looks like its
>> almost
>> > impossible to configure this on centos 6.5.
>> >
>> > As I have to prepare a POC ceph+cinder but with this config difficult to
>> > convince someone. Also wher ethe is no udev rules for centos 6.5 which I
>>
>> Were these the packages from ceph.com?
>>
>> > copied from git also why ceph-deploy dosent create required directories
>> on
>> > ceph-nodes ..like /var/lib/ceph/osd , /var/lib/ceph/bootstrap-osd for
>> person
>> > who is configuring it for first time almost get mad what went wrong
>> >
>> > Q1 - why it start creating pages after creating a cluster even there is
>> no
>> > OSD added to cluster ,, without osd where it tries to write below out
>> bfr
>> > adding osd to cluster
>> >
>> > [root@ceph-node1 my-cluster]# ceph-deploy mon create-initial
>> > [root@ceph-node1 my-cluster]# ceph -s
>> >     cluster fbf07780-f7bf-4d92-a144-a931ef5cd4a9
>> >      health HEALTH_ERR 192 pgs stuck inactive; 192 pgs stuck unclean; no
>> > osds
>> >      monmap e1: 1 mons at {ceph-node1=192.168.10.41:6789/0}, election
>> epoch
>> > 2, quorum 0 ceph-node1
>> >      osdmap e1: 0 osds: 0 up, 0 in
>> >       pgmap v2: 192 pgs, 3 pools, 0 bytes data, 0 objects
>> >             0 kB used, 0 kB / 0 kB avail
>> >                  192 creating
>> >
>> >
>> > After adding 1st OSD
>> >
>> > [root@ceph-node1 my-cluster]# ceph-deploy osd --zap-disk create
>> > ceph-node2:sdb
>> > [root@ceph-node1 my-cluster]# ceph -w
>> >     cluster fbf07780-f7bf-4d92-a144-a931ef5cd4a9
>> >      health HEALTH_WARN 192 pgs degraded; 192 pgs stuck unclean
>> >      monmap e1: 1 mons at {ceph-node1=192.168.10.41:6789/0}, election
>> epoch
>> > 2, quorum 0 ceph-node1
>> >      osdmap e5: 1 osds: 1 up, 1 in
>> >       pgmap v7: 192 pgs, 3 pools, 0 bytes data, 0 objects
>> >             35116 kB used, 5074 MB / 5108 MB avail
>> >                  192 active+degraded
>> >
>> > After 2nd OSD
>> >
>> > [root@ceph-node1 my-cluster]# ceph-deploy osd --zap-disk create
>> > ceph-node3:sdb
>> > [root@ceph-node1 my-cluster]# ceph -w
>> >     cluster fbf07780-f7bf-4d92-a144-a931ef5cd4a9
>> >      health HEALTH_WARN 192 pgs degraded; 192 pgs stuck unclean
>> >      monmap e1: 1 mons at {ceph-node1=192.168.10.41:6789/0}, election
>> epoch
>> > 2, quorum 0 ceph-node1
>> >      osdmap e8: 2 osds: 2 up, 2 in
>> >       pgmap v13: 192 pgs, 3 pools, 0 bytes data, 0 objects
>> >             68828 kB used, 10150 MB / 10217 MB avail
>> >                  192 active+degraded
>>
>> This is perfectly normal for firefly because the default replication is
>> now 3x and you only have 2 OSDs in your cluster.  If you add a third you
>> should see active+clean.
>>
>> If you were following an install guide, please let us know which one so we
>> can get it corrected.
>>
>> Thanks!
>> sage
>>
>>
>> > 2014-06-06 23:29:47.358646 mon.0 [INF] pgmap v13: 192 pgs: 192
>> > active+degraded; 0 bytes data, 68828 kB used, 10150 MB / 10217 MB avail
>> > 2014-06-06 23:31:46.711047 mon.0 [INF] pgmap v14: 192 pgs: 192
>> > active+degraded; 0 bytes data, 68796 kB used, 10150 MB / 10217 MB avail
>> >
>> > [root@ceph-node1 my-cluster]# cat /etc/ceph/ceph.conf
>> > [global]
>> > osd_pool_default_pgp_num = 100
>> > auth_service_required = cephx
>> > osd_pool_default_size = 2
>> > filestore_xattr_use_omap = true
>> > auth_client_required = cephx
>> > osd_pool_default_pg_num = 100
>> > auth_cluster_required = cephx
>> > mon_host = 192.168.10.41
>> > public_network = 192.168.10.0/24
>> > mon_clock_drift_allowed = .3
>> > mon_initial_members = ceph-node1
>> > cluster_network = 192.168.10.0/24
>> > fsid = fbf07780-f7bf-4d92-a144-a931ef5cd4a9
>> >
>> > [root@ceph-node1 my-cluster]# ceph osd dump
>> > epoch 8
>> > fsid fbf07780-f7bf-4d92-a144-a931ef5cd4a9
>> > created 2014-06-06 23:21:47.665510
>> > modified 2014-06-06 23:29:41.411379
>> > flags
>> > pool 0 'data' replicated size 2 min_size 1 crush_ruleset 0 object_hash
>> > rjenkins pg_num 64 pgp_num 64 last_change 1 owner 0 flags hashpspool
>> > crash_replay_interval 45 stripe_width 0
>> > pool 1 'metadata' replicated size 2 min_size 1 crush_ruleset 0
>> object_hash
>> > rjenkins pg_num 64 pgp_num 64 last_change 1 owner 0 flags hashpspool
>> > stripe_width 0
>> > pool 2 'rbd' replicated size 2 min_size 1 crush_ruleset 0 object_hash
>> > rjenkins pg_num 64 pgp_num 64 last_change 1 owner 0 flags hashpspool
>> > stripe_width 0
>> > max_osd 2
>> > osd.0 up   in  weight 1 up_from 4 up_thru 4 down_at 0
>> last_clean_interval
>> > [0,0) 192.168.10.42:6800/5848 192.168.10.42:6801/5848
>> > 192.168.10.42:6802/5848 192.168.10.42:6803/5848 exists,up
>> > 0f55a826-fa5b-44b2-b2f8-7b83d15526bf
>> > osd.1 up   in  weight 1 up_from 8 up_thru 0 down_at 0
>> last_clean_interval
>> > [0,0) 192.168.10.43:6800/7758 192.168.10.43:6801/7758
>> > 192.168.10.43:6802/7758 192.168.10.43:6803/7758 exists,up
>> > 5c701240-51a2-407a-b32a-9830935c1567
>> >
>> > [root@ceph-node1 my-cluster]# ceph osd tree
>> > # id    weight  type name       up/down reweight
>> > -1      0       root default
>> > -2      0               host ceph-node2
>> > 0       0                       osd.0   up      1
>> > -3      0               host ceph-node3
>> > 1       0                       osd.1   up      1
>> >
>> > Thanks
>> > Anil
>> >
>> >
>>
>
>

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] active+degraded even after following best practice

Reply via email to