Hi Dmitry Smirnov <only...@debian.org> writes:
> Control: tags -1 unreproducible > > On Sun, 16 Nov 2014 16:17:27 Gaudenz Steinlin wrote: >> > * Set 'hashpspool' flag on your pools (new default): >> > ceph osd pool set {pool} hashpspool true >> But on the other hand I could not find any information about why this >> should be run on upgrades. The documentation for this is very sparse. >> Dimitry do you know what sort of problems this command solves and why it >> should be run? > > Running this command is not mandatory but since it affects distribution of > data IMHO it make sense to set "hashpspool" just like you would adjust > tunables after upgrade. New pools are created with "hashpspool" by default so > I believe it just makes sense to update configuration of old pools. I agree that it makes sense to upgrade old pools. However there are some downsides to this. Issueing this command immediately makes the cluster unclean and leads to a degraded state. This command changes the algorithm to distribute PGs to OSDs. The cluster then starts backfilling to have a correct placement of all PGs again. This can create quite substantial IO load on a cluster which you probably want to plan for on a highly loaded production cluster. I guess this is also the reason this command is not mentioned at all in the ceph.com upgrade guide. Maybe someone from ceph.com can shed some more light on this. I suggest the attached patch to the README.Debian text. If you agree I will commit that change. Gaudenz
diff --git a/debian/README.Debian b/debian/README.Debian index f4ba80a..79b1273 100644 --- a/debian/README.Debian +++ b/debian/README.Debian @@ -43,10 +43,17 @@ * (Restart MDSes). - * Set 'hashpspool' flag on your pools (new default): + * Consider setting the 'hashpspool' flag on your pools (new default): ceph osd pool set {pool} hashpspool true + This changes the pool to use a new hashing algorithm for the distribution of + Placement Groups (PGs) to OSDs. This new algorithm ensures a better distribution + to all OSDs. Be aware that this change will temporarly put your cluster into a + degraded state and cause additional I/O until all PGs are moved to their new + location. See http://tracker.ceph.com/issues/4128 for the details about the new + algorithm. + Read more about tunables in http://ceph.com/docs/master/rados/operations/crush-map/#tunables