Hi folks,
we did upgrade one of our clusters from pacific to Quincy. Everything worked
fine, but cephadm complains about one osd not being upgraded:
[WRN] UPGRADE_REDEPLOY_DAEMON: Upgrading daemon osd.15 on host osd-dmz-k5-1
failed.
Upgrade daemon: osd.15: cephadm exited with an error code: 1, stderr:
Redeploy daemon osd.15 ...
Failed to trim old cgroups
/sys/fs/cgroup/system.slice/system-ceph\x2df852c3fc\x2d05a0\x2d11e8\x2dbae7\x2d77689751e5e7.slice/[email protected]
Non-zero exit code 1 from systemctl start
[email protected]
systemctl: stderr Job for
[email protected] failed because the
control process exited with error code.
systemctl: stderr See "systemctl status
[email protected]" and "journalctl -xeu
[email protected]" for details.
Traceback (most recent call last):
File
"/var/lib/ceph/f852c3fc-05a0-11e8-bae7-77689751e5e7/cephadm.8b92cafd937eb89681ee011f9e70f85937fd09c4bd61ed4a59981d275a1f255b",
line 9679, in <module>
main()
File
"/var/lib/ceph/f852c3fc-05a0-11e8-bae7-77689751e5e7/cephadm.8b92cafd937eb89681ee011f9e70f85937fd09c4bd61ed4a59981d275a1f255b",
line 9667, in main
r = ctx.func(ctx)
File
"/var/lib/ceph/f852c3fc-05a0-11e8-bae7-77689751e5e7/cephadm.8b92cafd937eb89681ee011f9e70f85937fd09c4bd61ed4a59981d275a1f255b",
line 2168, in _default_image
return func(ctx)
File
"/var/lib/ceph/f852c3fc-05a0-11e8-bae7-77689751e5e7/cephadm.8b92cafd937eb89681ee011f9e70f85937fd09c4bd61ed4a59981d275a1f255b",
line 5992, in command_deploy
deploy_daemon(ctx, ctx.fsid, daemon_type, daemon_id, c, uid, gid,
File
"/var/lib/ceph/f852c3fc-05a0-11e8-bae7-77689751e5e7/cephadm.8b92cafd937eb89681ee011f9e70f85937fd09c4bd61ed4a59981d275a1f255b",
line 3301, in deploy_daemon
deploy_daemon_units(ctx, fsid, uid, gid, daemon_type, daemon_id,
File
"/var/lib/ceph/f852c3fc-05a0-11e8-bae7-77689751e5e7/cephadm.8b92cafd937eb89681ee011f9e70f85937fd09c4bd61ed4a59981d275a1f255b",
line 3558, in deploy_daemon_units
call_throws(ctx, ['systemctl', 'start', unit_name])
File
"/var/lib/ceph/f852c3fc-05a0-11e8-bae7-77689751e5e7/cephadm.8b92cafd937eb89681ee011f9e70f85937fd09c4bd61ed4a59981d275a1f255b",
line 1806, in call_throws
raise RuntimeError(f'Failed command: {" ".join(command)}: {s}')
RuntimeError: Failed command: systemctl start
[email protected]: Job for
[email protected] failed because the
control process exited with error code.
See "systemctl status [email protected]"
and "journalctl -xeu [email protected]"
for details.
The osd in question seems to be running fine:
systemctl status [email protected]
● [email protected] - Ceph osd.15 for
f852c3fc-05a0-11e8-bae7-77689751e5e7
Loaded: loaded
(/etc/systemd/system/[email protected];
enabled; vendor preset: enabled)
Active: active (running) since Sat 2024-11-16 10:02:27 CET; 1 week 2 days
ago
Main PID: 24583 (conmon)
Tasks: 67 (limit: 76281)
Memory: 6.0G
CPU: 9h 57min 20.017s
CGroup:
/system.slice/system-ceph\x2df852c3fc\x2d05a0\x2d11e8\x2dbae7\x2d77689751e5e7.slice/[email protected]
├─libpod-payload-3e6ba1f01ad8ca4c20c08a9984bdd983f43b9a15a0ec1b452b4d17c9f5ef519e
│ ├─24586 /dev/init -- /usr/bin/ceph-osd -n osd.15 -f --setuser
ceph --setgroup ceph --default-log-to-file=false --default-log-to-journald=true
--default-log-to-stderr=false
│ └─24588 /usr/bin/ceph-osd -n osd.15 -f --setuser ceph --setgroup
ceph --default-log-to-file=false --default-log-to-journald=true
--default-log-to-stderr=false
└─supervisor
└─24583 /usr/bin/conmon --api-version 1 -c
3e6ba1f01ad8ca4c20c08a9984bdd983f43b9a15a0ec1b452b4d17c9f5ef519e -u
3e6ba1f01ad8ca4c20c08a9984bdd983f43b9a15a0ec1b452b4d17c9f5ef519e -r
/usr/bin/crun -b
/var/lib/containers/storage/overlay-containers/3e6ba1f01ad8ca4c20c08a9984bdd983f43b9a15a0ec1b452b4d17c9f5ef519e/userdata
-p /run/containers/storage/over>
Nov 25 11:23:14 osd-dmz-k5-1 ceph-osd[24588]: rocksdb: (Original Log Time
2024/11/25-10:23:14.904662) [db/memtable_list.cc:628] [default] Level-0 commit
table #794120: memtable #1 done
Nov 25 11:23:14 osd-dmz-k5-1 ceph-osd[24588]: rocksdb: (Original Log Time
2024/11/25-10:23:14.904710) EVENT_LOG_v1 {"time_micros": 1732530194904694,
"job": 1660, "event": "flush_finished", "output_compression": "NoCompression",
"lsm_state": [2, 1, 8, 44, 0, 0, 0], "immutable_memtables": 0}
Nov 25 11:23:14 osd-dmz-k5-1 ceph-osd[24588]: rocksdb: (Original Log Time
2024/11/25-10:23:14.904789) [db/db_impl/db_impl_compaction_flush.cc:233]
[default] Level summary: files[2 1 8 44 0 0 0] max score 0.78
Nov 25 11:23:14 osd-dmz-k5-1 ceph-osd[24588]: rocksdb:
[db/db_impl/db_impl_files.cc:415] [JOB 1660] Try to delete WAL files size
255924988, prev total WAL file size 256244157, number of live WAL files 2.
Nov 25 11:23:14 osd-dmz-k5-1 ceph-osd[24588]: rocksdb:
[file/delete_scheduler.cc:69] Deleted file db/794117.log immediately,
rate_bytes_per_sec 0, total_trash_size 0 max_trash_db_ratio 0.250000
Nov 25 11:23:14 osd-dmz-k5-1 ceph-osd[24588]: rocksdb: (Original Log Time
2024/11/25-10:23:14.905401) [db/db_impl/db_impl_compaction_flush.cc:2818]
Compaction nothing to do
Nov 25 11:32:34 osd-dmz-k5-1 ceph-osd[24588]: rocksdb:
[db/db_impl/db_impl.cc:901] ------- DUMPING STATS -------
Nov 25 11:32:34 osd-dmz-k5-1 ceph-osd[24588]: rocksdb:
[db/db_impl/db_impl.cc:903]
** DB Stats **
Uptime(secs): 783001.8 total,
600.0 interval
Cumulative writes: 24M writes,
97M keys, 24M commit groups, 1.0 writes per commit group, ingest: 119.22 GB,
0.16 MB/s
Cumulative WAL: 24M writes, 11M
syncs, 2.03 writes per sync, written: 119.22 GB, 0.16 MB/s
Cumulative stall: 00:00:0.000
H:M:S, 0.0 percent
Interval writes: 17K writes, 61K
keys, 17K commit groups, 1.0 writes per commit group, ingest: 95.37 MB, 0.16
MB/s
Interval WAL: 17K writes, 8473
syncs, 2.01 writes per sync, written: 0.09 MB, 0.16 MB/s
Interval stall: 00:00:0.000
H:M:S, 0.0 percent
** Compaction Stats [default] **
Level Files Size Score
Read(GB) Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s)
Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
L0 2/0 9.86 MB 0.5
0.0 0.0 0.0 1.9 1.9 0.0 1.0 0.0 29.6
66.03 64.32 505 0.131 0 0
L1 1/0 66.88 MB 0.7
4.6 1.9 2.7 3.3 0.7 0.0 1.8 71.1 52.1
65.56 60.72 126 0.520 100M 4156K
L2 8/0 450.76 MB 0.8
7.1 0.7 6.4 6.8 0.4 0.0 10.3 53.7 51.6
135.52 118.93 16 8.470 190M 1298K
L3 44/0 2.65 GB 0.1
0.7 0.3 0.4 0.4 -0.0 0.0 1.3 79.5 43.3
8.89 7.81 4 2.223 28M 17M
Sum 55/0 3.17 GB 0.0
12.3 2.9 9.5 12.4 3.0 0.0 6.5 45.8 46.2
276.01 251.78 651 0.424 318M 22M
Int 0/0 0.00 KB 0.0
0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 44.0
0.12 0.12 1 0.124 0 0
** Compaction Stats [default] **
Priority Files Size
Score Read(GB) Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s)
Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Low 0/0 0.00 KB 0.0
12.3 2.9 9.5 10.5 1.0 0.0 0.0 60.2 51.4
209.98 187.46 146 1.438 318M 22M
High 0/0 0.00 KB 0.0
0.0 0.0 0.0 1.9 1.9 0.0 0.0 0.0 29.5
66.00 64.32 504 0.131 0 0
User 0/0 0.00 KB 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 102.0
0.02 0.00 1 0.025 0 0
Uptime(secs): 783001.9 total,
600.0 interval
Flush(GB): cumulative 1.906,
interval 0.005
AddFile(GB): cumulative 0.000,
interval 0.000
AddFile(Total Files): cumulative
0, interval 0
AddFile(L0 Files): cumulative 0,
interval 0
AddFile(Keys): cumulative 0,
interval 0
Cumulative compaction: 12.45 GB
write, 0.02 MB/s write, 12.35 GB read, 0.02 MB/s read, 276.0 seconds
Interval compaction: 0.01 GB
write, 0.01 MB/s write, 0.00 GB read, 0.00 MB/s read, 0.1 seconds
Stalls(count): 0 level0_slowdown,
0 level0_slowdown_with_compaction, 0 level0_numfiles, 0
level0_numfiles_with_compaction, 0 stop for pending_compaction_bytes, 0
slowdown for pending_compaction_bytes, 0 memtable_compaction, 0
memtable_slowdown, interval 0 total count
** File Read Latency Histogram By
Level [default] **
** Compaction Stats [default] **
Level Files Size Score
Read(GB) Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s)
Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
L0 2/0 9.86 MB 0.5
0.0 0.0 0.0 1.9 1.9 0.0 1.0 0.0 29.6
66.03 64.32 505 0.131 0 0
L1 1/0 66.88 MB 0.7
4.6 1.9 2.7 3.3 0.7 0.0 1.8 71.1 52.1
65.56 60.72 126 0.520 100M 4156K
L2 8/0 450.76 MB 0.8
7.1 0.7 6.4 6.8 0.4 0.0 10.3 53.7 51.6
135.52 118.93 16 8.470 190M 1298K
L3 44/0 2.65 GB 0.1
0.7 0.3 0.4 0.4 -0.0 0.0 1.3 79.5 43.3
8.89 7.81 4 2.223 28M 17M
Sum 55/0 3.17 GB 0.0
12.3 2.9 9.5 12.4 3.0 0.0 6.5 45.8 46.2
276.01 251.78 651 0.424 318M 22M
Int 0/0 0.00 KB 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.00 0.00 0 0.000 0 0
** Compaction Stats [default] **
Priority Files Size
Score Read(GB) Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s)
Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Low 0/0 0.00 KB 0.0
12.3 2.9 9.5 10.5 1.0 0.0 0.0 60.2 51.4
209.98 187.46 146 1.438 318M 22M
High 0/0 0.00 KB 0.0
0.0 0.0 0.0 1.9 1.9 0.0 0.0 0.0 29.5
66.00 64.32 504 0.131 0 0
User 0/0 0.00 KB 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 102.0
0.02 0.00 1 0.025 0 0
Uptime(secs): 783001.9 total, 0.0
interval
Flush(GB): cumulative 1.906,
interval 0.000
AddFile(GB): cumulative 0.000,
interval 0.000
AddFile(Total Files): cumulative
0, interval 0
AddFile(L0 Files): cumulative 0,
interval 0
AddFile(Keys): cumulative 0,
interval 0
Cumulative compaction: 12.45 GB
write, 0.02 MB/s write, 12.35 GB read, 0.02 MB/s read, 276.0 seconds
How do i fix this? We tried redeploying the osd but to no success.
Best regards
Felix
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
Forschungszentrum Jülich GmbH
52425 Jülich
Sitz der Gesellschaft: Jülich
Eingetragen im Handelsregister des Amtsgerichts Düren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Stefan Müller
Geschäftsführung: Prof. Dr. Astrid Lambrecht (Vorsitzende),
Karsten Beneke (stellv. Vorsitzender), Prof. Dr. Ir. Pieter Jansens
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]