https://bugs.kde.org/show_bug.cgi?id=501479

            Bug ID: 501479
           Summary: Illumos DRD pthread_mutex_init wrapper errors
    Classification: Developer tools
           Product: valgrind
           Version: 3.24 GIT
          Platform: Compiled Sources
                OS: Unspecified
            Status: REPORTED
          Severity: normal
          Priority: NOR
         Component: drd
          Assignee: bart.vanassche+...@gmail.com
          Reporter: pjfl...@wanadoo.fr
  Target Milestone: ---

A few of the DRD tests are failing on OI hipster 2024.10. For instance
hold_lock_1

paulf@openindiana:~/valgrind$ cat drd/tests/hold_lock_1.stderr.diff 
--- hold_lock_1.stderr.exp      2023-09-10 09:26:27.606842684 +0200
+++ hold_lock_1.stderr.out      2025-03-14 07:30:48.347271974 +0100
@@ -1,27 +1,61 @@

 Locking mutex ...
-Acquired at:
+The object at address 0x........ is not a mutex.
    at 0x........: pthread_mutex_lock (drd_pthread_intercepts.c:?)
    by 0x........: main (hold_lock.c:?)
-Lock on mutex 0x........ was held during ... ms (threshold: 500 ms).
-   at 0x........: pthread_mutex_unlock (drd_pthread_intercepts.c:?)
+mutex 0x........ was first observed at:
+   at 0x........: pthread_mutex_init (drd_pthread_intercepts.c:?)
+   by 0x........: main (hold_lock.c:?)
+
+The object at address 0x........ is not a mutex.
+   at 0x........: pthread_mutex_lock (drd_pthread_intercepts.c:?)
    by 0x........: main (hold_lock.c:?)
 mutex 0x........ was first observed at:
    at 0x........: pthread_mutex_init (drd_pthread_intercepts.c:?)
    by 0x........: main (hold_lock.c:?)

-Locking rwlock exclusively ...
-Acquired at:
-   at 0x........: pthread_rwlock_wrlock (drd_pthread_intercepts.c:?)
+Mutex type changed: mutex 0x........, recursion count 2, owner 1.
+   at 0x........: pthread_mutex_unlock (drd_pthread_intercepts.c:?)
    by 0x........: main (hold_lock.c:?)
-Lock on rwlock 0x........ was held during ... ms (threshold: 500 ms).
-   at 0x........: pthread_rwlock_unlock (drd_pthread_intercepts.c:?)
+mutex 0x........ was first observed at:
+   at 0x........: pthread_mutex_init (drd_pthread_intercepts.c:?)
    by 0x........: main (hold_lock.c:?)
-rwlock 0x........ was first observed at:
-   at 0x........: pthread_rwlock_init (drd_pthread_intercepts.c:?)
+
+
+drd: drd_mutex.c:405 (vgDrd_mutex_unlock): Assertion 'p->mutex_type ==
mutex_type' failed.
+
+host stacktrace:
+   at 0x........: show_sched_status_wrk (m_libcassert.c:?)
+   by 0x........: report_and_quit (m_libcassert.c:?)
+   by 0x........: vgPlain_assert_fail (m_libcassert.c:?)
+   by 0x........: vgDrd_mutex_unlock (drd_mutex.c:?)
+   by 0x........: handle_thr_client_request (drd_clientreq.c:?)
+   by 0x........: handle_client_request (drd_clientreq.c:?)
+   by 0x........: wrap_tool_handle_client_request (m_tooliface.c:?)
+   by 0x........: do_client_request (scheduler.c:?)
+   by 0x........: vgPlain_scheduler (scheduler.c:?)
+   by 0x........: thread_wrapper (syswrap-solaris.c:134)
+   by 0x........: run_a_thread_NORETURN (syswrap-solaris.c:182)
+
+sched status:
+  running_tid=1
+
+Thread 1: status = VgTs_Runnable (lwpid 1)
+   at 0x........: pthread_mutex_unlock (drd_pthread_intercepts.c:?)
    by 0x........: main (hold_lock.c:?)
+client stack range: [0x........ 0x........] client SP: 0x........
+valgrind stack range: [0x........ 0x........] top usage: 10664 of 1048576

The code is

  pthread_mutexattr_init(&mutexattr);
  pthread_mutexattr_settype(&mutexattr, PTHREAD_MUTEX_RECURSIVE);
  pthread_mutex_init(&mutex, &mutexattr);
  pthread_mutexattr_destroy(&mutexattr);
  pthread_mutex_lock(&mutex); // error here on line 51


DRD contains two wrappers for pthread_mutex_init, one for the function itself
and one Solaris (and Illumos) only for mutex_init. Same thing for
pthread_mutex_destroy and mutex_destroy.

The two 'init' functions are different. However, for 'destroy' a weak alias is
used.

I'm not too sure how or why this ever worked properly. My suspicion is that at
some time 'pthread_mutex_init' made a sibling call to 'mutex_init' (see the
changes here
https://code.illumos.org/c/illumos-gate/+/3255/3/usr/src/lib/libc/port/threads/pthr_mutex.c#b245).
That would hide the call to mutex_init, so DRD would only see one 'init' call
and one 'destroy' call. After the change it would be seeing two inits and one
destroy. I don't know if the 'type' is different between the two. Solaris 11.3
and 11.4 don't use a sibling call.

Anyway, my initial debugging in gdb shows that I see
- intercepted pthread_mutex_init with tyoe mt equal to zero
- intercepted mutex_init with type equal to 6
- intercepted mutex_lock

I'm not certain but I think that the second 'init' is failing to record the
init with the right type (because it has already been recorded) and then the
lock looks for the mutext with type 6 and fails to find it.

I don't see much difference compared to Solaris 11.

Need to debug more the mutex kind.

-- 
You are receiving this mail because:
You are watching all bug changes.

Reply via email to