Hi Enrico,

[also Cc'ing Fabian and the relevant bug report]

On Thu, Feb 18, 2010 at 02:26:27PM +0100, Enrico Zini wrote:
> On Thu, Feb 18, 2010 at 12:49:21PM +0100, Philipp Kern wrote:
> > would you object or welcome a direct NMU for #570306?  I just want guessnet
> > installable in unstable again and the binNMUs failed badly.  ;-)
> > That still leaves us with #553906 which could've been transient. \-:
> Thanks! I have no problem at all with a NMU. In fact, there's even an
> open RFA for guessnet, because since I'm not using it anymore I have a
> bit of a hard time to test a new build before uploading.

I just made a build on sparc (sadly the porterbox is down and all that's left
for me to test on is a buildd), and there seems to be a race in the testsuite.
The first build failed in the testsuite.  Sadly I didn't preverse the build
tree and when I tried that it didn't fail anymore.  The failure is the same as
in #553906, but the build terminates normally:

>>> snip >>>
scanner_scanbag: ....
starter: ...
Segmentation fault
util_processrunner: .FAIL: tests/guessnet-test
==================================
1 of 1 test failed
Please report to enr...@debian.org
>>> snip >>>

What I get with `make check' at the toplevel is this:

>>> snip >>>
scanner_scanbag: ....
starter: ...
util_processrunner: [1=F]

---> group: util_processrunner, test: test<1>
     problem: assertion failed
     failed assertion: "util/processrunner-tut.cc:86(lfalse.tag == "false"): : 
expected 'false' actual ''"

tests summary: failures:1 ok:7
PASS: tests/guessnet-test
>>> snip >>>

The next time I try I get this:

>>> snip >>>
scanner_scanbag: ....
starter: ...
util_processrunner: [1=F]killing signal catched
Bus error
FAIL: tests/guessnet-test
>>> snip >>>

gdb yields this sometimes (it's definitely a race):

>>> snip >>>
util_processrunner: .
Program received signal SIGBUS, Bus error.
[Switching to Thread 0xf7ad7b70 (LWP 13886)]
0xf7e08424 in std::basic_string<char, std::char_traits<char>, 
std::allocator<char> >::basic_string(std::string const&) () from 
/usr/lib/libstdc++.so.6
(gdb) bt
#0  0xf7e08424 in std::basic_string<char, std::char_traits<char>, 
std::allocator<char> >::basic_string(std::string const&) () from 
/usr/lib/libstdc++.so.6
#1  0x00046e94 in ProcData (this=0xbb8e0) at util/processrunner.h:48
#2  ProcessRunner::main (this=0xbb8e0) at util/processrunner.cc:215
#3  0x0007dd44 in wibble::sys::Thread::Starter(void*) ()
#4  0xf7ec6358 in start_thread () from /lib/libpthread.so.0
#5  0xf7bc5f1c in ?? () from /lib/libc.so.6
#6  0xf7bc5f1c in ?? () from /lib/libc.so.6
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
>>> snip >>>

I cannot really step through the code because of the threading.

The relevant code snippet (if we trust the backtrace):

>>> snip >>>
212                                         while (!queuedForRunning.empty())
213                                         {
214                                                 debug("PRI Run a queued 
process\n");
215                                                 ProcData d = 
queuedForRunning.front();
216                                                 queuedForRunning.pop();
217                                                 d.run();
218                                                 
proclist.insert(make_pair(d.script->pid(), d));
219                                                 //debug("run process %s 
pid: %d\n", d.tag.c_str(), d.script->pid());
220                                         }
>>> snip >>>

Other failures:

>>> snip >>>
scanner_scanbag: ....
starter: ...
util_processrunner: [1=F]terminate called after throwing an instance of 
'std::runtime_error'
  what():  killing signal catched
Aborted (core dumped)
>>> snip >>>

That gives us a garbled stack.  All the core dumps I'm able to generate
are not useful.

One backtrace I'm able to get in gdb directly:

>>> snip >>>
(gdb) bt
#0  0xf7e6d070 in ?? () from /usr/lib/libstdc++.so.6
#1  0xf7c997bc in _Unwind_DeleteException () from /lib/libgcc_s.so.1
#2  0x0003d8a0 in tut::test_runner::run_tests (argc=<value optimized out>, 
argv=<value optimized out>) at /usr/include/wibble/tests/tut.h:318
#3  main (argc=<value optimized out>, argv=<value optimized out>) at 
tests/tut-main.cpp:43
>>> snip >>>

Let's go `-O0 -g':

>>> snip >>>
util_processrunner: .
Program received signal SIGBUS, Bus error.
[Switching to Thread 0xf7b07b70 (LWP 29473)]
0xf7e38424 in std::basic_string<char, std::char_traits<char>, 
std::allocator<char> >::basic_string(std::string const&) () from 
/usr/lib/libstdc++.so.6
(gdb) bt
#0  0xf7e38424 in std::basic_string<char, std::char_traits<char>, 
std::allocator<char> >::basic_string(std::string const&) () from 
/usr/lib/libstdc++.so.6
#1  0x0007362c in ProcData (this=0xf7b071dc) at util/processrunner.h:48
#2  0x000704fc in ProcessRunner::main (this=0x11b8e0) at 
util/processrunner.cc:215
#3  0x000bc79c in wibble::sys::Thread::Starter(void*) ()
#4  0xf7ef6358 in start_thread () from /lib/libpthread.so.0
#5  0xf7bf5f1c in ?? () from /lib/libc.so.6
#6  0xf7bf5f1c in ?? () from /lib/libc.so.6
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) up 1
#1  0x0007362c in ProcData (this=0xf7b071dc) at util/processrunner.h:48
48      {
(gdb) bt full
#0  0xf7e38424 in std::basic_string<char, std::char_traits<char>, 
std::allocator<char> >::basic_string(std::string const&) () from 
/usr/lib/libstdc++.so.6
No symbol table info available.
#1  0x0007362c in ProcData (this=0xf7b071dc) at util/processrunner.h:48
No locals.
#2  0x000704fc in ProcessRunner::main (this=0x11b8e0) at 
util/processrunner.cc:215
        d = {tag = {static npos = 4294967295, 
            _M_dataplus = {<std::allocator<char>> = 
{<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>}, _M_p 
= 0x11b998 ""}}, 
          script = 0x11c7e8, listener = 0x11b610}
        lock = {mutex = @0x11b8ec, locked = true, yield = false}
        pid = 0
        status = 256
        want_shutdown = false
        sigs = {__val = {2147479879, 4294967294, 4294967295 <repeats 30 times>}}
        oldsigs = {__val = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2020, 976, 
4026531840, 4155535624, 4155555844, 4155550328, 0, 0, 5, 1093, 4160297568, 
4160295888, 
            4159644435, 4155595396, 4159639492, 1, 0, 2822930839, 4160293000, 
4155535852, 4155535840, 1}}
        proclist = {_M_t = {
            _M_impl = {<std::allocator<std::_Rb_tree_node<std::pair<int const, 
processrunner::ProcData> > >> = 
{<__gnu_cxx::new_allocator<std::_Rb_tree_node<std::pair<int const, 
processrunner::ProcData> > >> = {<No data fields>}, <No data fields>}, 
              _M_key_compare = {<std::binary_function<int, int, bool>> = {<No 
data fields>}, <No data fields>}, _M_header = {_M_color = std::_S_red, 
                _M_parent = 0x11b5a8, _M_left = 0x11b5a8, _M_right = 0x11bdd8}, 
_M_node_count = 2}}}
#3  0x000bc79c in wibble::sys::Thread::Starter(void*) ()
No symbol table info available.
#4  0xf7ef6358 in start_thread () from /lib/libpthread.so.0
No symbol table info available.
#5  0xf7bf5f1c in ?? () from /lib/libc.so.6
No symbol table info available.
#6  0xf7bf5f1c in ?? () from /lib/libc.so.6
No symbol table info available.
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
>>> snip >>>

Sadly that doesn't look more helpful and I'm now utterly confused how to
continue. :)

Kind regards,
Philipp Kern
-- 
 .''`.  Philipp Kern                        Debian Developer
: :' :  http://philkern.de                         Stable Release Manager
`. `'   xmpp:p...@0x539.de                         Wanna-Build Admin
  `-    finger pkern/k...@db.debian.org

Attachment: signature.asc
Description: Digital signature

Reply via email to