Launchpad has imported 27 comments from the remote bug at https://bugzilla.redhat.com/show_bug.cgi?id=1222286.
If you reply to an imported comment from within Launchpad, your comment will be sent to the remote bug automatically. Read more about Launchpad's inter-bugtracker facilities at https://help.launchpad.net/InterBugTracking. ------------------------------------------------------------------------ On 2015-05-17T09:33:29+00:00 Richard wrote: Description of problem: My ARMv7 server is broken at the moment so I can't try this, but it seems as if the simple command: /usr/bin/qemu-system-arm -help hangs (on ARMv7 hardware). See this build log for an example: https://kojipkgs.fedoraproject.org//work/tasks/3698/9763698/build.log Version-Release number of selected component (if applicable): qemu 2:2.3.0-3.fc23.armv7hl How reproducible: Unknown. Steps to Reproduce: 1. Run `qemu-system-arm -help' Actual results: Hangs. Expected results: Should print help output. Reply at: https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1432786/comments/3 ------------------------------------------------------------------------ On 2015-05-17T09:55:27+00:00 Richard wrote: Another build and again it's hanging in the same place, but again only on armv7hl: http://koji.fedoraproject.org/koji/taskinfo?taskID=9767065 https://kojipkgs.fedoraproject.org//work/tasks/7067/9767067/build.log This works fine on x86-64 & i686. The configure.ac fragment is very simple so it's not likely to be a problem with autoconf, although I guess anything is possible: AC_MSG_CHECKING([that $QEMU -help works]) if $QEMU -help >&AS_MESSAGE_LOG_FD 2>&1; then AC_MSG_RESULT([yes]) else AC_MSG_RESULT([no]) AC_MSG_FAILURE(<error message omitted>) fi Reply at: https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1432786/comments/4 ------------------------------------------------------------------------ On 2015-05-17T13:52:22+00:00 Richard wrote: strace output is here: https://kojipkgs.fedoraproject.org//work/tasks/7470/9767470/build.log It's frankly bizarre. The last system calls are: open("/dev/urandom", O_RDONLY|O_LARGEFILE) = 3 fcntl64(3, F_GETFD) = 0 fcntl64(3, F_SETFD, FD_CLOEXEC) = 0 fstat64(3, {st_mode=S_IFCHR|0444, st_rdev=makedev(1, 9), ...}) = 0 clock_gettime(CLOCK_REALTIME, {1431870340, 192190777}) = 0 getrusage(0x1 /* RUSAGE_??? */, {ru_utime={0, 330000}, ru_stime={0, 55000}, ...} ) = 0 read(3, "*\222:\253\26\211\313\245\6KF\256Bm\242\363s\202>`u\223\6*1\351\234\324 E\362\347j", 32) = 32 read(3, "83\313y\206\10\316^\226\342&\266;R3\215\375\350\227\363+\16?\210LTy\345 v.\251\243", 32) = 32 read(3, "\231\302\270,\311\271z\324", 8) = 8 gettimeofday({1431870340, 198569}, NULL) = 0 gettimeofday({1431870340, 198647}, NULL) = 0 ... and gettimeofday repeats in a loop forever. The context looks as if it is a bug in libselinux. Reply at: https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1432786/comments/5 ------------------------------------------------------------------------ On 2015-05-17T15:41:44+00:00 Richard wrote: According to gdb, this is the backtrace: warning: Probes-based dynamic linker interface failed. Reverting to original interface. Cannot parse expression `.L1046 4@r4'. #0 0xb6fd10b0 in dl_main () from /lib/ld-linux-armhf.so.3 #1 0xb6fe6670 in _dl_sysdep_start () from /lib/ld-linux-armhf.so.3 #2 0xb6fd23a8 in _dl_start () from /lib/ld-linux-armhf.so.3 #3 0xb6fcdb50 in _start () from /lib/ld-linux-armhf.so.3 Backtrace stopped: previous frame identical to this frame (corrupt stack?) Reply at: https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1432786/comments/6 ------------------------------------------------------------------------ On 2015-05-17T16:01:56+00:00 Richard wrote: OK I believe the code is from libnettle (gnutls): lib/nettle/rnd-common.c:_rnd_system_entropy_init contains _gnutls_urandom_fd = open("/dev/urandom", O_RDONLY); if (_gnutls_urandom_fd < 0) { _gnutls_debug_log("Cannot open urandom!\n"); goto fallback; } old = fcntl(_gnutls_urandom_fd, F_GETFD); if (old != -1) fcntl(_gnutls_urandom_fd, F_SETFD, old | FD_CLOEXEC); if (fstat(_gnutls_urandom_fd, &st) >= 0) { _gnutls_urandom_fd_mode = st.st_mode; } lib/nettle/rnd-common.c:_rnd_get_event contains gettime(&e->now); #ifdef HAVE_GETRUSAGE if (getrusage(ARG_RUSAGE, &e->rusage) < 0) { _gnutls_debug_log("getrusage failed: %s\n", strerror(errno)); } #endif which looks identical to the sequence of syscalls being made. It's still a mystery where the infinite loop happens. Reply at: https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1432786/comments/7 ------------------------------------------------------------------------ On 2015-05-17T16:14:34+00:00 Richard wrote: I'm still pretty sure this stack trace is wrong, but I got gdb to print out the register contents: warning: Probes-based dynamic linker interface failed. Reverting to original interface. Cannot parse expression `.L1046 4@r4'. r0 0xb6fff8f4 3070228724 r1 0x0 0 r2 0xb6fff908 3070228744 r3 0xb6fff8f4 3070228724 r4 0xb6fff8f4 3070228724 r5 0x0 0 r6 0xfffffffc 4294967292 r7 0xa3 163 r8 0xa3 163 r9 0x0 0 r10 0xb6fff908 3070228744 r11 0xbefff95c 3204446556 r12 0xb6ffefac 3070226348 sp 0xbefff8c0 0xbefff8c0 lr 0xb6fd10b0 -1224929104 pc 0xb6fd10b0 0xb6fd10b0 <dl_main+11624> cpsr 0x60070010 1611071504 Thread 1 (process 804): #0 0xb6fd10b0 in dl_main () from /lib/ld-linux-armhf.so.3 #1 0xb6fe6670 in _dl_sysdep_start () from /lib/ld-linux-armhf.so.3 #2 0xb6fd23a8 in _dl_start () from /lib/ld-linux-armhf.so.3 #3 0xb6fcdb50 in _start () from /lib/ld-linux-armhf.so.3 Backtrace stopped: previous frame identical to this frame (corrupt stack?) Reply at: https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1432786/comments/8 ------------------------------------------------------------------------ On 2015-05-17T16:48:47+00:00 Richard wrote: *If* the stack trace is to be believed, then it points to the following code in glibc. PC is marked with an arrow. 40a0: eb00376c bl 11e58 <_dl_debug_initialize> 40a4: e580500c str r5, [r0, #12] 40a8: e1a04000 mov r4, r0 40ac: eb003768 bl 11e54 <_dl_debug_state> --> 40b0: e320f000 nop {0} 40b4: eb0052c7 bl 18bd8 <_dl_unload_cache> 40b8: e24bd020 sub sp, fp, #32 40bc: e8bd8ff0 pop {r4, r5, r6, r7, r8, r9, sl, fp, pc} Reply at: https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1432786/comments/9 ------------------------------------------------------------------------ On 2015-05-17T22:25:30+00:00 Nikos wrote: The gnutls stacktrace being repeated is run within the library constructor. Could it be that one of these calls causes a dlopen in that platform which in turn causes the infinite loop? Reply at: https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1432786/comments/10 ------------------------------------------------------------------------ On 2015-05-18T15:22:51+00:00 Richard wrote: Finally I am able to reproduce this locally, so I'll attach a debugger and find out what's really going on here. Reply at: https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1432786/comments/11 ------------------------------------------------------------------------ On 2015-05-18T17:41:20+00:00 Richard wrote: First observation is that the stack trace above *is* bogus. qemu-system-arm for some reason behaves differently under gdb, triggering some kind of (gdb?) bug. Please ignore the stack traces in comments 3, 5 & 6. And my apologies for blaming gnutls. The REAL stack trace points to ceph: #0 0xb284846c in gettimeofday () from /lib/libc.so.6 #1 0xb405b868 in Cycles::init () at common/Cycles.cc:75 #2 0xb6b28500 in call_init.part () from /lib/ld-linux-armhf.so.3 #3 0xb6b28664 in _dl_init () from /lib/ld-linux-armhf.so.3 #4 0xb6b17b84 in _dl_start_user () from /lib/ld-linux-armhf.so.3 Reply at: https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1432786/comments/12 ------------------------------------------------------------------------ On 2015-05-18T18:01:46+00:00 Richard wrote: A very trivial reproducer for this is: $ echo 'main(){}' > test.c $ gcc test.c -lrados -o test $ ./test The final command will hang forever on 32 bit ARM. Reply at: https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1432786/comments/13 ------------------------------------------------------------------------ On 2015-05-18T18:13:07+00:00 Richard wrote: Created attachment 1026798 fix-cycles-arch.patch BLEAH UPSTREAM FIRST *!#$ Turns out that Ubuntu fixed this and are carrying a downstream patch for it, which I have attached. Ubuntu bug: https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1432786 Reply at: https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1432786/comments/14 ------------------------------------------------------------------------ On 2015-05-18T18:32:07+00:00 Boris wrote: Actually, that patch got half-merged upstream (the first half) and that is why you were seeing the hang-ups. I'll apply the second half of the patch and rebuild the packages. Reply at: https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1432786/comments/15 ------------------------------------------------------------------------ On 2015-05-18T22:01:38+00:00 Fedora wrote: ceph-0.94.1-1.fc22 has been submitted as an update for Fedora 22. https://admin.fedoraproject.org/updates/FEDORA-2015-8189/ceph-0.94.1-1.fc22 Reply at: https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1432786/comments/16 ------------------------------------------------------------------------ On 2015-05-18T22:32:05+00:00 Boris wrote: btw: Note that it is not fixed in ceph-0.94.1-1.fc22, it is (should be) fixed in ceph-0.94.1-2.fc22 and ceph-0.94.1-2.fc23 that were just recently built in koji: http://koji.fedoraproject.org/koji/taskinfo?taskID=9777655 http://koji.fedoraproject.org/koji/taskinfo?taskID=9777671 Reply at: https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1432786/comments/17 ------------------------------------------------------------------------ On 2015-05-19T07:29:19+00:00 Richard wrote: Can confirm that this is fixed in ceph-0.94.1-2.fc23 on armv7hl. Reply at: https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1432786/comments/18 ------------------------------------------------------------------------ On 2015-05-20T02:54:10+00:00 Fedora wrote: Package ceph-0.94.1-2.fc22: * should fix your issue, * was pushed to the Fedora 22 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing ceph-0.94.1-2.fc22' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2015-8189/ceph-0.94.1-2.fc22 then log in and leave karma (feedback). Reply at: https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1432786/comments/19 ------------------------------------------------------------------------ On 2015-05-20T14:24:49+00:00 Cole wrote: FWIW qemu built against this package still seems to hang: http://koji.fedoraproject.org/koji/taskinfo?taskID=9780467 But I'm trying another build now Reply at: https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1432786/comments/20 ------------------------------------------------------------------------ On 2015-05-22T02:31:20+00:00 Fedora wrote: Package ceph-0.94.1-3.fc22: * should fix your issue, * was pushed to the Fedora 22 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing ceph-0.94.1-3.fc22' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2015-8189/ceph-0.94.1-3.fc22 then log in and leave karma (feedback). Reply at: https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1432786/comments/21 ------------------------------------------------------------------------ On 2015-05-29T20:32:28+00:00 Richard wrote: This has regressed somehow, so it's started to happen again. https://kojipkgs.fedoraproject.org//work/tasks/1496/9871496/build.log The build log shows that it is using: DEBUG util.py:388: librados2 armv7hl 1:0.94.1-3.fc23 build 1.5 M https://kojipkgs.fedoraproject.org//work/tasks/1496/9871496/root.log I don't understand exactly why, since the patch exists and is still being applied. Perhaps the patch is not a complete fix for the problem. Reply at: https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1432786/comments/22 ------------------------------------------------------------------------ On 2015-05-29T21:15:11+00:00 Richard wrote: I can reproduce this on baremetal armv7hl with librados2-0.94.1-3.fc23.armv7hl qemu-system-arm-2.3.0-5.fc23.armv7hl and (as before) simply running 'qemu-system-arm -help'. Unfortunately gdb doesn't let me get a stack trace. It simply ignores ^C. I was able to use strace, and the process is doing this over and over again: gettimeofday({1432934064, 725311}, NULL) = 0 nanosleep({0, 2000001}, NULL) = 0 gettimeofday({1432934064, 728002}, NULL) = 0 nanosleep({0, 2000001}, NULL) = 0 Reply at: https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1432786/comments/23 ------------------------------------------------------------------------ On 2015-05-29T21:35:13+00:00 Richard wrote: Created attachment 1032285 strace qemu-system-arm -help Reply at: https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1432786/comments/24 ------------------------------------------------------------------------ On 2015-06-01T02:45:40+00:00 Boris wrote: Just few thoughts: The ceph itself should not run the loop, now. Is there any chance that qemu is trying to do that instead? Can you still reproduce on bare metal with the simple reproducer that you mentioned in Comment 10? What about the older librados2-0.94.1-2.fc23.armv7hl? The only change between -2 and -3 should be that lttng was disabled but one never knows. I'm trying to get my hands on an armhfp machine to debug further myself but that may take some time. Reply at: https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1432786/comments/25 ------------------------------------------------------------------------ On 2015-06-01T07:25:04+00:00 Boris wrote: I'll answer the questions myself as I've finally got my hands on an armhfp machine where I could reproduce. The steps in Comment 10 do not reproduce the issue so that bug is indeed fixed. I've tapped into the running process (gdb -p $(pidof qemu-system- arm)) and I didn't see anything ceph (rbd/rados) related in the backtrace. The backtrace leading to nanosleep was #0 0xffffffff in nanosleep () at /lib/libpthread.so.0 #1 0xffffffff in base::internal::SpinLockDelay(int volatile*, int, int) () at /lib/libtcmalloc.so.4 #2 0xffffffff in SpinLock::SlowLock() () at /lib/libtcmalloc.so.4 #3 0xffffffff in tcmalloc::ThreadCache::InitModule() () at /lib/libtcmalloc.so.4 #4 0xffffffff in tc_malloc () at /lib/libtcmalloc.so.4 #5 0xffffffff in __fopen_internal () at /lib/libc.so.6 #6 0xffffffff in load_debug_frame () at /lib/libunwind.so.8 #7 0xffffffff in locate_debug_info () at /lib/libunwind.so.8 #8 0xffffffff in _ULarm_dwarf_find_debug_frame () at /lib/libunwind.so.8 #9 0xffffffff in _ULarm_dwarf_callback () at /lib/libunwind.so.8 #10 0xffffffff in dl_iterate_phdr () at /lib/libc.so.6 #11 0xffffffff in _ULarm_find_proc_info () at /lib/libunwind.so.8 #12 0xffffffff in fetch_proc_info () at /lib/libunwind.so.8 #13 0xffffffff in _ULarm_dwarf_find_save_locs () at /lib/libunwind.so.8 #14 0xffffffff in _ULarm_dwarf_step () at /lib/libunwind.so.8 #15 0xffffffff in _ULarm_step () at /lib/libunwind.so.8 #16 0xffffffff in GetStackTrace_libunwind(void**, int, int) () at /lib/libtcmalloc.so.4 #17 0xffffffff in GetStackTrace(void**, int, int) () at /lib/libtcmalloc.so.4 #18 0xffffffff in tcmalloc::PageHeap::GrowHeap(unsigned int) () at /lib/libtcmalloc.so.4 #19 0xffffffff in tcmalloc::PageHeap::New(unsigned int) () at /lib/libtcmalloc.so.4 #20 0xffffffff in tcmalloc::CentralFreeList::Populate() () at /lib/libtcmalloc.so.4 #21 0xffffffff in tcmalloc::CentralFreeList::FetchFromOneSpansSafe(int, void**, void**) () at /lib/libtcmalloc.so.4 #22 0xffffffff in tcmalloc::CentralFreeList::RemoveRange(void**, void**, int) () at /lib/libtcmalloc.so.4 #23 0xffffffff in tcmalloc::ThreadCache::FetchFromCentralCache(unsigned int, unsigned int) () at /lib/libtcmalloc.so.4 #24 0xffffffff in tc_malloc () at /lib/libtcmalloc.so.4 #25 0xffffffff in strdup () at /lib/libc.so.6 #26 0xffffffff in verify_selinuxmnt () at /lib/libselinux.so.1 #27 0xffffffff in init_lib () at /lib/libselinux.so.1 #28 0xffffffff in call_init.part () at /lib/ld-linux-armhf.so.3 #29 0xffffffff in _dl_init_internal () at /lib/ld-linux-armhf.so.3 #30 0xffffffff in _dl_start_user () at /lib/ld-linux-armhf.so.3 The suspects seemed to be tcmalloc (gperftools) or libunwind. While both of these are used by ceph, neither of them is used by the client libraries (rbd/rados) that qemu links to. I've done some more tests and it seems that tcmalloc (gperftools) is actually the one causing the hang-ups, i.e. similar to comment 10, I did $ echo 'main(){}' > test.c $ gcc test.c -ltcmalloc -o test $ ./test and the program hanged indefinitely. I'm moving this bz back to ON_QA (as there actually was a bug in ceph code that was fixed in the -2 release) and I'll clone this bz for the tcmalloc issue. Reply at: https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1432786/comments/26 ------------------------------------------------------------------------ On 2015-06-01T08:40:12+00:00 Richard wrote: Thanks - yes I believe you are correct. Quite annoying that two difficult bugs look coincidentally so similar :-( Reply at: https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1432786/comments/27 ------------------------------------------------------------------------ On 2015-06-04T18:14:26+00:00 Ken wrote: The patch was merged to Ceph master upstream in 35c5fd0091fc4d63b21207fb94e46b343519fd56 (https://github.com/ceph/ceph/pull/4839) Reply at: https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1432786/comments/28 ------------------------------------------------------------------------ On 2015-06-04T20:19:45+00:00 Fedora wrote: ceph-0.94.1-3.fc22 has been pushed to the Fedora 22 stable repository. If problems still persist, please make note of it in this bug report. Reply at: https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1432786/comments/30 ** Changed in: ceph (Fedora) Status: Unknown => Fix Released ** Changed in: ceph (Fedora) Importance: Unknown => Undecided -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1432786 Title: upgrade of ceph in vivid breaks qemu-system-arm To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1432786/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs