This is similar to an old thread
https://lists.ceph.io/hyperkitty/list/[email protected]/message/FXVEWDU6NYCGEY5QB6IGGQXTUEZAQKNY/
but I don't see any responses there so opening this one.
PROBLEM DESCRIPTION
* Issue is seen on versioned buckets.
* Using extended logging (debug level 5), we can see that LC deletes expired
(non-current) versions of objects but those versions are still to be found in
the bucket: both listed in the bucket index and accessible to the user.
AN EXAMPLE OBJECT/VERSION
* Bucket: bald
* Object: adc/certs/injr-f5lb01b-vadc04.json
* Versions: It currently has 6 versions
$ aws s3api list-object-versions --bucket=bald
--prefix=adc/certs/injr-f5lb01b-vadc04.json jq -r '.Versions[] | [.Key,
.VersionId, .LastModified] | @tsv'
adc/certs/injr-f5lb01b-vadc04.json nBHrRDYZzuIrA0hORAIzh6QG8rzRF14
2024-06-28T21:13:00.014Z
adc/certs/injr-f5lb01b-vadc04.json YGgH7VmZDq4M-j8qIrKq.4Valvvuoh4
2024-06-26T20:58:09.835Z
adc/certs/injr-f5lb01b-vadc04.json qefseb1l.6WJyDNhH5buqX-qcZV2GAJ
2024-06-18T21:32:02.304Z
adc/certs/injr-f5lb01b-vadc04.json s4YG598JEQC9A5jaJuI5S4XkCh1NRpN
2024-06-10T21:37:16.074Z
adc/certs/injr-f5lb01b-vadc04.json z96LISOi8jBYHCrnbqHgNAAsnpAqbXm
2024-06-07T01:15:21.802Z
adc/certs/injr-f5lb01b-vadc04.json Rfgi2NdGYWy.g7H6JevgqsDLXahSHJp
2024-05-30T19:45:19.726Z
Looking at the oldest version with ID Rfgi2NdGYWy.g7H6JevgqsDLXahSHJp: LC
actually deletes (or tries to delete?) it at several different occasions -- LC
runs daily at midnight UTC. For example,
.
.
.
2024-07-04T00:00:02.711+0000 7fd30f989700 2 lifecycle:
DELETED::bald[1eeb7b2c-aaab-4dff-be19-be27acab9e85.352350675.1034]):adc/certs/injr-f5lb01b-vadc04.json[Rfgi2NdGYWy.g7H6JevgqsDLXahSHJp]
(non-current expiration) wp_thrd: 1, 0
.
.
2024-07-08T00:00:04.989+0000 7f199a7ac700 2 lifecycle:
DELETED::bald[1eeb7b2c-aaab-4dff-be19-be27acab9e85.352350675.1034]):adc/certs/injr-f5lb01b-vadc04.json[Rfgi2NdGYWy.g7H6JevgqsDLXahSHJp]
(non-current expiration) wp_thrd: 2, 3
.
.
2024-07-09T00:00:02.671+0000 7f5a23cea700 2 lifecycle:
DELETED::bald[1eeb7b2c-aaab-4dff-be19-be27acab9e85.352350675.1034]):adc/certs/injr-f5lb01b-vadc04.json[Rfgi2NdGYWy.g7H6JevgqsDLXahSHJp]
(non-current expiration) wp_thrd: 0, 4
However, as seen in the above in aws-cli output, this version is still there.
Below is the output when we retrieve this exact version:
$ aws s3api get-object --bucket=bald --key=adc/certs/injr-f5lb01b-vadc04.json
--version-id=Rfgi2NdGYWy.g7H6JevgqsDLXahSHJp /tmp/outfile
{
"AcceptRanges": "bytes",
"Expiration": "expiry-date=\"Sat, 01 Jun 2024 00:00:00 GMT\",
rule-id=\"delete-prior-versions\"",
"LastModified": "Thu, 30 May 2024 19:45:19 GMT",
"ContentLength": 1299,
"ETag": "\"d9c9ff538f4e2f1435746d16cd9e62c8\"",
"VersionId": "Rfgi2NdGYWy.g7H6JevgqsDLXahSHJp",
"ContentType": "binary/octet-stream",
"Metadata": {}
}
and in radosgw-admin bucket list:
{
"name": "adc/certs/injr-f5lb01b-vadc04.json",
"instance": "Rfgi2NdGYWy.g7H6JevgqsDLXahSHJp",
"ver": {
"pool": 15,
"epoch": 563937
},
"locator": "",
"exists": true,
"meta": {
"category": 1,
"size": 1299,
"mtime": "2024-05-30T19:45:19.726701Z",
"etag": "d9c9ff538f4e2f1435746d16cd9e62c8",
"storage_class": "",
...
"content_type": "",
"accounted_size": 1299,
"user_data": "",
"appendable": false
},
"tag": "1eeb7b2c-aaab-4dff-be19-be27acab9e85.1228454711.3959044872362449539",
"flags": 1,
"pending_map": [],
"versioned_epoch": 85
},
SOME MORE NOTES ON THIS BUCKET & OBJECT
* The current object is not deleted: no delete-marker.
* No object locking is configured for this bucket.
* I don't see any trace of this bucket or object in gc list.
* The bucket has 101 shards and each shard has around ~30K objects so there's
no noticeable skewness in distribution of objects across bucket index. However,
I see below ERROR lines streaming when listing the bucket. Not sure it'd be
relevant to the LC issue.
...
2024-07-09T18:28:32.470+0000 7f8ef5db0740 0 ERROR: list_objects_ordered marker
failed to make forward progress; attempt=4,
prev_marker=lb_summary.lock[s2BIhE0HnXnj.yONcP6.T-dkwU-aWhn],
cur_marker=lb_summary.lock[njFF6AkNKUoCVIsi-6pJVhDyaK8FycS]
2024-07-09T18:28:32.530+0000 7f8ef5db0740 0 ERROR: list_objects_ordered marker
failed to make forward progress; attempt=2,
prev_marker=lb_summary.lock[njFF6AkNKUoCVIsi-6pJVhDyaK8FycS],
cur_marker=lb_summary.lock[GDUSDyTB4nGsjT0GfDYlnB5zrB8UnSV]
2024-07-09T18:28:32.546+0000 7f8ef5db0740 0 ERROR: list_objects_ordered marker
failed to make forward progress; attempt=3,
prev_marker=lb_summary.lock[U4cpWSv2b5bI8rm1vsDw.kcXmXrDYuV],
cur_marker=lb_summary.lock[2NCtXN7KbO0ypy4CSmJCk1gGZEnhWfL]
...
* We ran "bucket check --fix" on the bucket a few days ago but it didn't
resolve the LC issue or "failed to make forward progress" error stream during
bucket listing.
* Bucket stats for reference:
$ radosgw-admin bucket stats --bucket=bald | jq '. | .bucket, [.num_shards,
.usage]'
"bald"
[
101,
{
"rgw.main": {
"size": 13065121441,
"size_actual": 16259469312,
"size_utilized": 13065121441,
"size_kb": 12758908,
"size_kb_actual": 15878388,
"size_kb_utilized": 12758908,
"num_objects": 1233984
}
}
]
QUESTIONS
* Is this by any chance a known issue? I searched the tracker but couldn't find
a duplicate.
* Any ideas why the deletes initiated by LC might fail silently? I don't see
any indication of gc queue getting full around the time.
* Any ideas on debugging this issue further? Would log-level 20 be helpful
and/or any other log lines to look for?
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]