On 09/24/2011 12:48 PM, Michael Meeks wrote:
I'm poking at an endless hang in the smoketest:#12 0xb7d24aec in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libc.so.6 #3 0xb7f1b6c0 in osl_waitCondition () from /data/opt/libreoffice/core/solver/unxlngi6.pro/lib/libuno_sal.so.3 #4 0xb72db42a in osl::Condition::wait (this=0xbfffb8c4, pTimeout=0x0) at /data/opt/libreoffice/core/solver/unxlngi6.pro/inc/osl/conditn.hxx:84 #5 0xb72d9024 in (anonymous namespace)::Test::test (this=0xb7c16008) at /data/opt/libreoffice/core/smoketestoo_native/smoketest.cxx:200 #6 0xb72d9e2e in CppUnit::TestCaller<<unnamed>::Test>::runTest(void) (this=0xb73ac0a8) at /data/opt/libreoffice/core/solver/unxlngi6.pro/inc/cppunit/TestCaller.h:166 If I were a betting man I'd say this is down to us waiting on a condition, and not spinning the main-loop; but (to be honest) this remote-control nonsense is somewhat opaque to me. I see no live soffice.bin process being controlled. I was slightly amazed to read: toolkit/source/awt/AsyncCallback::addCallback() which seems to do nothing / not fire an exception if Application::IsInMain() is not true - which is in itself odd. I have another quiescent thread: #2 0xb7d24b44 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/libc.so.6 #3 0xb7f3f18e in ?? () from /data/opt/libreoffice/core/solver/unxlngi6.pro/lib/libuno_sal.so.3 #4 0xb7c28b05 in start_thread (arg=0xb7c0fb70) at pthread_create.c:297 #5 0xb7d16d5e in clone () from /lib/libc.so.6 So - I'm tempted to say: Result result; // Shifted to main thread to work around potential deadlocks (i112867): com::sun::star::awt::AsyncCallback::create( connection_.getComponentContext())->addCallback( new Callback( disp, url, css::uno::Sequence< css::beans::PropertyValue(),new Listener(&result)), css::uno::Any()); result.condition.wait(); CPPUNIT_ASSERT(result.success); should be a timed wait - but only if we fail if the timeout is triggered (ie. not on the common path). I've committed that at 30 seconds - possibly this needs tweaking to be infinite when under the debugger.
A timed wait is no solution here. (Timeouts in this kind of code pose at least two problems. For one, they prevent a human from coming back to a hung "make check" after a while, only to find out they no longer get a clue where it hang, as the build has unhelpfully been forced to move forward. For another, what is typically also needed is proper cleanup, like killing abandoned sub-processes, so that manual intervention is needed, anyway.) The real solution, instead, is to not only wait on the Result object, but also on the OfficeConnection. Fixed as <http://cgit.freedesktop.org/libreoffice/core/commit/?id=c09b966f94f5a50fe537916398451339f008947d>.
-Stephan _______________________________________________ LibreOffice mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/libreoffice
