[Lldb-commits] [PATCH] D149792: Add AArch64 MASK watchpoint support to debugserver

Jason Molenda via Phabricator via lldb-commits Wed, 03 May 2023 15:11:49 -0700

jasonmolenda created this revision.
jasonmolenda added a reviewer: JDevlieghere.
jasonmolenda added a project: LLDB.
Herald added subscribers: omjavaid, kristof.beyls.
Herald added a project: All.
jasonmolenda requested review of this revision.
Herald added a subscriber: lldb-commits.


AArch64 supports two types of watchpoints - 1-8 byte watchpoints, which can 
watch up to 8 contiguous bytes aligned to a doubleword boundary, and mask 
watchpoints that can watch any power-of-2 range of memory aligned to that 
boundary (BAS watchpoints and MASK watchpoints).  This patch adds support for 
MASK watchpoints when the user requests larger than 8 bytes be watched.

The major caveat with this is that there remains real work in lldb to support 
large watchpoints.  If a user asks to watch 192 bytes, aligned to a 256 byte 
boundary, we still need to watch 256 bytes.  Writes to the final 64 bytes will 
trap the watchpoint, and we need to work out how to silently skip these 
watchpoint hits.  Watchpoints have a type: "read", "write", and I expect I'll 
add a new type "modify" that is the default, and stops when the memory region 
being watched *changes*.

(this problem of a mask watchpoint watching a larger memory range than the user 
expected is exacerbated with a memory region not properly aligned to that 
power-of-2 size.  and it can come up easily with smaller objects, e.g. a 24 
byte object requires a 32-byte watchpoint, or possibly two watchpoints if it's 
not aligned to a 32-byte boundary.)

The changes I did https://reviews.llvm.org/D149040 make this patch mostly a 
simple drop-in of the method for setting the debug control register properly 
for MASK watchpoints.  There are some extra changes to the method which takes a 
trap address and tries to find the hardware watchpoint register which resulted 
in that watchpoint hit.  (we further refine this later based on the user's 
watchpoint requests, using a "nearest wp wins" if no watched region correctly 
includes it, for AArch64 reasons.)

I added a test case with an array of uint32_t's, watch 256 of those unit32_t's, 
and a loop that modifies every 16th uint32_t.  The test confirms that we hit 
this 256-uint32_t's watchpoint 16 times.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D149792

Files:
  lldb/test/API/functionalities/watchpoint/large-watchpoint/Makefile
  
lldb/test/API/functionalities/watchpoint/large-watchpoint/TestLargeWatchpoint.py
  lldb/test/API/functionalities/watchpoint/large-watchpoint/main.c
  lldb/test/API/python_api/watchpoint/watchlocation/TestTargetWatchAddress.py
  lldb/tools/debugserver/source/MacOSX/arm64/DNBArchImplARM64.cpp
  lldb/tools/debugserver/source/MacOSX/arm64/DNBArchImplARM64.h

Index: lldb/tools/debugserver/source/MacOSX/arm64/DNBArchImplARM64.h
===================================================================
--- lldb/tools/debugserver/source/MacOSX/arm64/DNBArchImplARM64.h
+++ lldb/tools/debugserver/source/MacOSX/arm64/DNBArchImplARM64.h
@@ -86,7 +86,8 @@
                                     bool write, bool also_set_on_task) override;
   uint32_t SetBASWatchpoint(WatchpointSpec wp, bool read, bool write,
                             bool also_set_on_task);
-  uint32_t SetMASKWatchpoint(WatchpointSpec wp);
+  uint32_t SetMASKWatchpoint(WatchpointSpec wp, bool read, bool write,
+                             bool also_set_on_task);
   bool DisableHardwareWatchpoint(uint32_t hw_break_index,
                                  bool also_set_on_task) override;
   bool DisableHardwareWatchpoint_helper(uint32_t hw_break_index,
Index: lldb/tools/debugserver/source/MacOSX/arm64/DNBArchImplARM64.cpp
===================================================================
--- lldb/tools/debugserver/source/MacOSX/arm64/DNBArchImplARM64.cpp
+++ lldb/tools/debugserver/source/MacOSX/arm64/DNBArchImplARM64.cpp
@@ -42,11 +42,6 @@
 #define WCR_LOAD ((uint32_t)(1u << 3))
 #define WCR_STORE ((uint32_t)(1u << 4))
 
-// Enable breakpoint, watchpoint, and vector catch debug exceptions.
-// (MDE bit in the MDSCR_EL1 register.  Equivalent to the MDBGen bit in
-// DBGDSCRext in Aarch32)
-#define MDE_ENABLE ((uint32_t)(1u << 15))
-
 // Single instruction step
 // (SS bit in the MDSCR_EL1 register)
 #define SS_ENABLE ((uint32_t)(1u))
@@ -807,8 +802,6 @@
                        (uint64_t)m_state.dbg.__bvr[i],
                        (uint32_t)m_state.dbg.__bcr[i]);
 
-      // The kernel will set the MDE_ENABLE bit in the MDSCR_EL1 for us
-      // automatically, don't need to do it here.
       kret = SetDBGState(also_set_on_task);
 
       DNBLogThreadedIf(LOG_WATCHPOINTS,
@@ -848,9 +841,10 @@
 
   /// Round up \a requested_size to the next power-of-2 size, at least 8
   /// bytes
-  /// requested_size == 3  -> aligned_size == 8
-  /// requested_size == 13 -> aligned_size == 16
-  /// requested_size == 16 -> aligned_size == 16
+  /// requested_size == 8   -> aligned_size == 8
+  /// requested_size == 9   -> aligned_size == 16
+  /// requested_size == 15  -> aligned_size == 16
+  /// requested_size == 192 -> aligned_size == 256
   /// Could be `std::bit_ceil(aligned_size)` when we build with C++20?
   aligned_size = 1ULL << (addr_bit_size - __builtin_clzll(aligned_size - 1));
 
@@ -922,7 +916,7 @@
     if (wps[0].aligned_size <= 8)
       return SetBASWatchpoint(wps[0], read, write, also_set_on_task);
     else
-      return INVALID_NUB_HW_INDEX;
+      return SetMASKWatchpoint(wps[0], read, write, also_set_on_task);
   }
 
   // We have multiple WatchpointSpecs
@@ -1024,9 +1018,6 @@
                    (uint64_t)m_state.dbg.__wvr[i],
                    (uint32_t)m_state.dbg.__wcr[i]);
 
-  // The kernel will set the MDE_ENABLE bit in the MDSCR_EL1 for us
-  // automatically, don't need to do it here.
-
   kret = SetDBGState(also_set_on_task);
   // DumpDBGState(m_state.dbg);
 
@@ -1042,6 +1033,81 @@
   return INVALID_NUB_HW_INDEX;
 }
 
+uint32_t
+DNBArchMachARM64::SetMASKWatchpoint(DNBArchMachARM64::WatchpointSpec wp,
+                                    bool read, bool write,
+                                    bool also_set_on_task) {
+  const uint32_t num_hw_watchpoints = NumSupportedHardwareWatchpoints();
+
+  // Read the debug state
+  kern_return_t kret = GetDBGState(false);
+  if (kret != KERN_SUCCESS)
+    return INVALID_NUB_HW_INDEX;
+
+  // Check to make sure we have the needed hardware support
+  uint32_t i = 0;
+
+  for (i = 0; i < num_hw_watchpoints; ++i) {
+    if ((m_state.dbg.__wcr[i] & WCR_ENABLE) == 0)
+      break; // We found an available hw watchpoint slot
+  }
+  if (i == num_hw_watchpoints) {
+    DNBLogThreadedIf(LOG_WATCHPOINTS,
+                     "DNBArchMachARM64::"
+                     "SetMASKWatchpoint(): All "
+                     "hardware resources (%u) are in use.",
+                     num_hw_watchpoints);
+    return INVALID_NUB_HW_INDEX;
+  }
+
+  DNBLogThreadedIf(LOG_WATCHPOINTS,
+                   "DNBArchMachARM64::"
+                   "SetMASKWatchpoint() "
+                   "set hardware register %d to MASK watchpoint "
+                   "aligned start address 0x%llx, aligned size %zu",
+                   i, wp.aligned_start, wp.aligned_size);
+
+  // Clear any previous LoHi joined-watchpoint that may have been in use
+  LoHi[i] = 0;
+
+  // MASK field is the number of low bits that are masked off
+  // when comparing the address with the DBGWVR<n>_EL1 values.
+  // If aligned size is 16, that means we ignore low 4 bits, 0b1111.
+  // popcount(16 - 1) give us the correct value of 4.
+  // 2GB is max watchable region, which is 31 bits (low bits 0x7fffffff
+  // masked off) -- a MASK value of 31.
+  uint64_t mask = __builtin_popcountl(wp.aligned_size - 1) << 24;
+  // A '0b11111111' BAS value needed for mask watchpoints plus a
+  // nonzero mask value.
+  uint64_t not_bas_wp = 0xff << 5;
+
+  m_state.dbg.__wvr[i] = wp.aligned_start;
+  m_state.dbg.__wcr[i] = mask | not_bas_wp | S_USER | // Stop only in user mode
+                         (read ? WCR_LOAD : 0) |      // Stop on read access?
+                         (write ? WCR_STORE : 0) |    // Stop on write access?
+                         WCR_ENABLE;                  // Enable this watchpoint;
+
+  DNBLogThreadedIf(LOG_WATCHPOINTS,
+                   "DNBArchMachARM64::SetMASKWatchpoint() "
+                   "adding watchpoint on address 0x%llx with control "
+                   "register value 0x%llx",
+                   (uint64_t)m_state.dbg.__wvr[i],
+                   (uint64_t)m_state.dbg.__wcr[i]);
+
+  kret = SetDBGState(also_set_on_task);
+
+  DNBLogThreadedIf(LOG_WATCHPOINTS,
+                   "DNBArchMachARM64::"
+                   "SetMASKWatchpoint() "
+                   "SetDBGState() => 0x%8.8x.",
+                   kret);
+
+  if (kret == KERN_SUCCESS)
+    return i;
+
+  return INVALID_NUB_HW_INDEX;
+}
+
 bool DNBArchMachARM64::ReenableHardwareWatchpoint(uint32_t hw_index) {
   // If this logical watchpoint # is actually implemented using
   // two hardware watchpoint registers, re-enable both of them.
@@ -1068,14 +1134,11 @@
 
   DNBLogThreadedIf(LOG_WATCHPOINTS,
                    "DNBArchMachARM64::"
-                   "SetBASWatchpoint( %u ) - WVR%u = "
+                   "ReenableHardwareWatchpoint_helper( %u ) - WVR%u = "
                    "0x%8.8llx  WCR%u = 0x%8.8llx",
                    hw_index, hw_index, (uint64_t)m_state.dbg.__wvr[hw_index],
                    hw_index, (uint64_t)m_state.dbg.__wcr[hw_index]);
 
-  // The kernel will set the MDE_ENABLE bit in the MDSCR_EL1 for us
-  // automatically, don't need to do it here.
-
   kret = SetDBGState(false);
 
   return (kret == KERN_SUCCESS);
@@ -1177,30 +1240,63 @@
     uint32_t i, num = NumSupportedHardwareWatchpoints();
     for (i = 0; i < num; ++i) {
       nub_addr_t wp_addr = GetWatchAddress(debug_state, i);
-      uint32_t byte_mask = bits(debug_state.__wcr[i], 12, 5);
 
       DNBLogThreadedIf(LOG_WATCHPOINTS,
                        "DNBArchImplARM64::"
                        "GetHardwareWatchpointHit() slot: %u "
-                       "(addr = 0x%llx; byte_mask = 0x%x)",
-                       i, static_cast<uint64_t>(wp_addr), byte_mask);
+                       "(addr = 0x%llx, WCR = 0x%llx)",
+                       i, wp_addr, debug_state.__wcr[i]);
 
       if (!IsWatchpointEnabled(debug_state, i))
         continue;
 
-      if (bits(wp_addr, 48, 3) != bits(addr, 48, 3))
-        continue;
+      // DBGWCR<n>EL1.BAS are the bits of the doubleword that are watched
+      // with a BAS watchpoint.
+      uint32_t bas_bits = bits(debug_state.__wcr[i], 12, 5);
+      // DBGWCR<n>EL1.MASK is the number of bits that are masked off the
+      // virtual address when comparing to DBGWVR<n>_EL1.
+      uint32_t mask = bits(debug_state.__wcr[i], 28, 24);
+
+      bool is_bas_watchpoint = false;
+      if (mask == 0)
+        is_bas_watchpoint = true;
+
+      DNBLogThreadedIf(
+          LOG_WATCHPOINTS,
+          "DNBArchImplARM64::"
+          "GetHardwareWatchpointHit() slot: %u %s",
+          i, is_bas_watchpoint ? "is BAS watchpoint" : "is MASK watchpoint");
+
+      if (is_bas_watchpoint) {
+        if (bits(wp_addr, 48, 3) != bits(addr, 48, 3))
+          continue;
+      } else {
+        if (bits(wp_addr, 48, mask) == bits(addr, 48, mask)) {
+          DNBLogThreadedIf(LOG_WATCHPOINTS,
+                           "DNBArchImplARM64::"
+                           "GetHardwareWatchpointHit() slot: %u matched MASK "
+                           "ignoring %u low bits",
+                           i, mask);
+          return i;
+        }
+      }
 
-      // Sanity check the byte_mask
-      uint32_t lsb = LowestBitSet(byte_mask);
-      if (lsb < 0)
-        continue;
+      if (is_bas_watchpoint) {
+        // Sanity check the bas_bits
+        uint32_t lsb = LowestBitSet(bas_bits);
+        if (lsb < 0)
+          continue;
 
-      uint64_t byte_to_match = bits(addr, 2, 0);
+        uint64_t byte_to_match = bits(addr, 2, 0);
 
-      if (byte_mask & (1 << byte_to_match)) {
-        addr = wp_addr + lsb;
-        return i;
+        if (bas_bits & (1 << byte_to_match)) {
+          addr = wp_addr + lsb;
+          DNBLogThreadedIf(LOG_WATCHPOINTS,
+                           "DNBArchImplARM64::"
+                           "GetHardwareWatchpointHit() slot: %u matched BAS",
+                           i);
+          return i;
+        }
       }
     }
   }
Index: lldb/test/API/python_api/watchpoint/watchlocation/TestTargetWatchAddress.py
===================================================================
--- lldb/test/API/python_api/watchpoint/watchlocation/TestTargetWatchAddress.py
+++ lldb/test/API/python_api/watchpoint/watchlocation/TestTargetWatchAddress.py
@@ -129,10 +129,3 @@
             "pointee",
             value.GetValueAsUnsigned(0),
             value.GetType().GetPointeeType())
-        # Watch for write to *g_char_ptr.
-        error = lldb.SBError()
-        watchpoint = target.WatchAddress(
-            value.GetValueAsUnsigned(), 365, False, True, error)
-        self.assertFalse(watchpoint)
-        self.expect(error.GetCString(), exe=False,
-                    substrs=['watch size of %d is not supported' % 365])
Index: lldb/test/API/functionalities/watchpoint/large-watchpoint/main.c
===================================================================
--- /dev/null
+++ lldb/test/API/functionalities/watchpoint/large-watchpoint/main.c
@@ -0,0 +1,16 @@
+#include <stdint.h>
+#include <stdio.h>
+#include <string.h>
+#include <stdlib.h>
+int main() {
+  const int count = 65535;
+  int *array = (int*) malloc(sizeof (int) * count);
+  memset (array, 0, count * sizeof (int));
+
+  puts ("break here");
+
+  for (int i = 0; i < count - 16; i += 16) 
+    array[i] += 10;
+
+  puts ("done, exiting.");
+}
Index: lldb/test/API/functionalities/watchpoint/large-watchpoint/TestLargeWatchpoint.py
===================================================================
--- /dev/null
+++ lldb/test/API/functionalities/watchpoint/large-watchpoint/TestLargeWatchpoint.py
@@ -0,0 +1,59 @@
+"""
+Watch larger-than-8-bytes regions of memory, confirm that
+writes to those regions are detected.
+"""
+
+
+
+import lldb
+from lldbsuite.test.decorators import *
+from lldbsuite.test.lldbtest import *
+from lldbsuite.test import lldbutil
+
+class UnalignedWatchpointTestCase(TestBase):
+
+    def continue_and_report_stop_reason(self, process, iter_str):
+        process.Continue()
+        self.assertIn(process.GetState(), [lldb.eStateStopped, lldb.eStateExited],
+                         iter_str)
+        thread = process.GetSelectedThread()
+        return thread.GetStopReason()
+
+
+    NO_DEBUG_INFO_TESTCASE = True
+    # debugserver on AArch64 has this feature.
+    @skipIf(archs=no_match(['arm64', 'arm64e', 'aarch64']))
+    @skipUnlessDarwin
+
+    # debugserver only gained the ability to watch larger regions
+    # with this patch.
+    @skipIfOutOfTreeDebugserver
+
+    def test_large_watchpoint(self):
+        """Test watchpoint that covers a large region of memory."""
+        self.build()
+        self.main_source_file = lldb.SBFileSpec("main.c")
+        (target, process, thread, bkpt) = lldbutil.run_to_source_breakpoint(self,
+             "break here", self.main_source_file)
+
+        frame = thread.GetFrameAtIndex(0)
+
+        array_addr = frame.GetValueForVariablePath("array").GetValueAsUnsigned()
+
+        # watch 256 uint32_t elements in the middle of the array, 
+        # don't assume that the heap allocated array is aligned 
+        # to a 1024 byte boundary to begin with, force alignment.
+        wa_256_addr = ((array_addr + 1024) & ~(1024-1))
+        err = lldb.SBError()
+        wp = target.WatchAddress(wa_256_addr, 1024, False, True, err)
+        self.assertTrue(wp.IsValid())
+        self.assertSuccess(err)
+
+        c_count = 0
+        reason = self.continue_and_report_stop_reason(process, "continue #%d" % c_count)
+        while reason == lldb.eStopReasonWatchpoint:
+            c_count = c_count + 1
+            reason = self.continue_and_report_stop_reason(process, "continue #%d" % c_count)
+            self.assertLessEqual(c_count, 16)
+
+        self.assertEqual(c_count, 16)
Index: lldb/test/API/functionalities/watchpoint/large-watchpoint/Makefile
===================================================================
--- /dev/null
+++ lldb/test/API/functionalities/watchpoint/large-watchpoint/Makefile
@@ -0,0 +1,3 @@
+C_SOURCES := main.c
+
+include Makefile.rules

_______________________________________________
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits

[Lldb-commits] [PATCH] D149792: Add AArch64 MASK watchpoint support to debugserver

Reply via email to