Hi,
An update on this issue. Thanks to suggestions from Frédéric Nass, I
think I managed to clear the problem by deleting the realm and all its
objects (zonegroup, zone, period) with radosgw-admin and deleting the
pools associated with the deleted zone. I am sure it is not a general
solution for this problem that I was able to reproduce on a test
cluster. I've the feeling that radosgw-admin should make a better job to
avoid creating such a mess when deleting zones but it is another story.
The reasons why deleting the realm and its objects worked for us include:
- The realm/zonegroup/zone was just created and there was no useful
content in it so loosing everything related to it was an option as said
previously (but deleting .rgw.root was not an option as we have several
realms in production).
- We configure each realm/zonegroup/zone with a separate set of RGW
(that can be deployed on the same server by cephadm but it is another
story) so the only RGW impacts are those related to the deleted realm.
- Our realm was monosite. After deleting the realm, it is not possible
to push (commit) the change to other zonegroup/zones of the realm as the
realm must exist to be able to commit a new period. I guess that in a
multisite configuration, it means that the cleanup operation must be
done in all the clusters involved in the multisite configuration.
Best regards,
Michel
Le 14/05/2025 à 18:12, Michel Jouvin a écrit :
Hi,
We are still stucked with this problem and I have not seen an answer
to my previous emails. We found in the doc the explanation of the
problem:
https://docs.ceph.com/en/latest/radosgw/multisite/#deleting-a-zone.
But the doc does not mention the way out of the problem... If we
delete the realm would it help? There is no content in this
realm/zonegroup/zone so removing everything is an option if it helps.
Thanks in advance for any hint. Best regards,
Michel
Sent from my mobile
Le 7 mai 2025 16:49:19 Michel Jouvin <[email protected]> a
écrit :
Hi,
I managed to find what where the zone and zonegroup ID before they were
deleted and I confirm that those referred into the error messages are
the ID of the deleted zone and zonegroup. The new zone and zonegroup
(which have the same name, again not sure if it is a problem as
everything should be done by ID, isn't it) have been defined as master
zone and zonegroup, so the other ones should just be deleted, isn't it?
I really don't understand what the error means and what can be done to
fix it.
Best regards,
Michel
Le 06/05/2025 à 21:29, Michel Jouvin a écrit :
Hi,
It is not the first time that after doing configuration changes in
RADOS for a realm/zonegroup/zone with radosgw-admin, we get errors
when trying to do a "period update --commit". We never found a good
documentation on how to fix these problems, up to now we always
managed at some point to restore a good configuration that can be
commited but it is probably time for us to have a more informed
approach!
Last occurence of the problem happened today with a
realm/zonegroup/zone created recently. Trying to fix a problem with
the non working haproxy associated with it, one of my colleagues
decided to delete and recreate the zone and zonegroup (with the same
names). The related commands worked but since it has been done any
attempt to do "period update --commit" results in the following error:
-------
2025-05-06T11:56:20.939+0200 7fdc7d41da80 0 failed reading obj info
from .rgw.root:zone_info.93af6e0c-4552-4c2e-b167-36114a5a81e4: (2) No
such file or directory
2025-05-06T11:56:20.945+0200 7fdc7d41da80 0 failed reading obj info
from .rgw.root:zonegroup_info.d7221099-4e7d-43cb-a1e8-28a750de1cd5:
(2) No such file or directory
2025-05-06T11:56:21.160+0200 7fdc7d41da80 0 failed reading obj info
from .rgw.root:zone_info.93af6e0c-4552-4c2e-b167-36114a5a81e4: (2) No
such file or directory
2025-05-06T11:56:21.160+0200 7fdc7d41da80 -1 Cannot find zone
id=93af6e0c-4552-4c2e-b167-36114a5a81e4 (name=default)
2025-05-06T11:56:21.160+0200 7fdc7d41da80 0 ERROR: failed to start
notify service ((22) Invalid argument
2025-05-06T11:56:21.160+0200 7fdc7d41da80 0 ERROR: failed to init
services (ret=(22) Invalid argument)
couldn't init storage provider
-------
I have the feeling that it is related to the delete objects that are
no longer found but it is not completely clear what is the way out of
it? Is the problem related to recreating the zone/zonegroup with the
same names? There are several realms already in production so we
cannot do a .rgw.root reset but this particular realm has never been
put in production so we can delete everything related to it.
Thanks in advance for any hint or pointer. Best regards,
Michel
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]