On Thu, 2021-03-04 at 11:15 +0100, Stefano Garzarella wrote:
> On Wed, Mar 03, 2021 at 02:07:24PM +0200, Maxim Levitsky wrote:
> > On Tue, 2021-03-02 at 15:52 +0100, Stefano Garzarella wrote:
> > > On Mon, Mar 01, 2021 at 02:56:40PM +0200, Maxim Levitsky wrote:
> > > > On Mon, 2021-03-01 at 12:17 +0100, Paolo Bonzini wrote:
> > > > > If kvm_arch_remove_sw_breakpoint finds that a software breakpoint
> > > > > does not
> > > > > have an INT3 instruction, it fails. This can happen if one sets a
> > > > > software breakpoint in a kernel module and then reloads it. gdb then
> > > > > thinks the breakpoint cannot be deleted and there is no way to add it
> > > > > back.
> > > > >
> > > > > Suggested-by: Maxim Levitsky <[email protected]>
> > > > > Signed-off-by: Paolo Bonzini <[email protected]>
> > > > > ---
> > > > > target/i386/kvm/kvm.c | 9 +++++++--
> > > > > 1 file changed, 7 insertions(+), 2 deletions(-)
> > > > >
> > > > > diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> > > > > index 0b5755e42b..c8d61daf68 100644
> > > > > --- a/target/i386/kvm/kvm.c
> > > > > +++ b/target/i386/kvm/kvm.c
> > > > > @@ -4352,8 +4352,13 @@ int kvm_arch_remove_sw_breakpoint(CPUState
> > > > > *cs, struct kvm_sw_breakpoint *bp)
> > > > > {
> > > > > uint8_t int3;
> > > > >
> > > > > - if (cpu_memory_rw_debug(cs, bp->pc, &int3, 1, 0) || int3 != 0xcc
> > > > > ||
> > > > > - cpu_memory_rw_debug(cs, bp->pc, (uint8_t *)&bp->saved_insn,
> > > > > 1, 1)) {
> > > > > + if (cpu_memory_rw_debug(cs, bp->pc, &int3, 1, 0)) {
> > > > > + return -EINVAL;
> > > > > + }
> > > > > + if (int3 != 0xcc) {
> > > > > + return 0;
> > > > > + }
> > > > > + if (cpu_memory_rw_debug(cs, bp->pc, (uint8_t *)&bp->saved_insn,
> > > > > 1, 1)) {
> > > > > return -EINVAL;
> > > > > }
> > > > > return 0;
> > > >
> > > > There still remains a philosopical question if
> > > > kvm_arch_remove_sw_breakpoint
> > > > should always return 0, since for the usual case of kernel debugging
> > > > where
> > > > a breakpoint is in unloaded module, the above will probably still fail
> > > > as paging for this module is removed as well by the kernel.
> > > > It is still better though so:
> > > >
> > > > Reviewed-by: Maxim Levitsky <[email protected]>
> > > >
> > > > Note that I managed to make lx-symbols to work in a very stable way
> > > > with attached WIP patch I hacked on this Sunday.
> > > > I will send a cleaned up version of it to upstream when I have time.
> > > >
> > > > Since I make gdb unload the symbols, it works even without this patch.
> > > >
> > > > Added Stefano Garzarella to CC as I know that he tried to make this
> > > > work as well.
> > > > https://lkml.org/lkml/2020/10/5/514
> > >
> > > Thanks Maxim for CCing me!
> > >
> > > Just a report when I tried these patches, but I'm not sure they are
> > > related.
> > >
> > > I found that gdb 10 has some problem with QEMU:
> > >
> > > $ gdb --version
> > > GNU gdb (GDB) Fedora 10.1-2.fc33
> > >
> > > (gdb) lx-symbols
> > > loading vmlinux
> > > scanning for modules in linux/build
> > > ../../gdb/dwarf2/frame.c:1085: internal-error: Unknown CFA rule.
> > >
> > > With gdb 9 'lx-symbols' works well, but I still have some issue when I
> > > put a breakpoint to a symbol not yet loaded (vsock_core_register in this
> > > example), then I load the module (vsock_loopback in this example) in the
> > > guest.
> > >
> > > Whit your patch gdb stuck after loading the symbols of the first new
> > > module:
> > > (gdb) b vsock_core_register
> > > Function "vsock_core_register" not defined.
> > > Make breakpoint pending on future shared library load? (y or [n]) y
> > > Breakpoint 1 (vsock_core_register) pending.
> > > (gdb) c
> > > Continuing.
> > > loading @0xffffffffc00a1000: linux/build/net/vmw_vsock/vsock.ko
> > >
> > > Without your patch, gdb loops infinitely reloading the new module:
> > > refreshing all symbols to reload module 'vsock'
> > > loading vmlinux
> > > loading @0xffffffffc00a1000: linux/build/net/vmw_vsock/vsock.ko
> > > loading @0xffffffffc00ad000: linux/build/drivers/net/tun.ko
> > > loading @0xffffffffc007e000: linux/build/net/bridge/bridge.ko
> > > loading @0xffffffffc0077000: linux/build/net/802/stp.ko
> > > loading @0xffffffffc0007000: linux/build/net/llc/llc.ko
> > > loading @0xffffffffc0013000: linux/build/net/sunrpc/sunrpc.ko
> > > loading @0xffffffffc000d000:
> > > linux/build/net/ipv4/netfilter/ip_tables.ko
> > > loading @0xffffffffc0000000: linux/build/net/netfilter/x_tables.ko
> > > refreshing all symbols to reload module 'vsock'
> > > loading vmlinux
> > > loading @0xffffffffc00a1000: linux/build/net/vmw_vsock/vsock.ko
> > > loading @0xffffffffc00ad000: linux/build/drivers/net/tun.ko
> > > ...
> > >
> > > I'll try to get a better look at what's going on.
> >
> > Let me then explain what I found:
> >
> > First of all initial execution of lx-symbols works and always worked for me
> > (I use gdb 9 though from fedora 32 (9.1-7.fc32))
> >
> >
> > Then a special breakpoint is placed on do_init_module
> > (it is hidden from the user)
> >
> > Once it is hit two things can happen:
> >
> > 1. if a not yet seen module is loaded,
> > (module that wasn't loaded last time all symbols were reloaded)
> > its symbols are loaded to gdb with 'add-symbol-file' command.
> >
> > 2. if module that was already loaded to gdb, is loaded (see above),
> > then 'big hammer' is used:
> >
> > a. all existing breakpoints (including the hidden one)
> > are disabled since reloading everything
> > indeed messes up the gdb state
> >
> > b. the executable's symbols (the kernel) are reloaded,
> > which makes gdb unload all symbols, and then all symbols are loaded
> > again (in the same way as lx-symbols works),
> > including the symbols of currently loading module.
> >
> > c. all breakpoints are enabled again
> >
> >
> > Now the issue that you originally reported on LKML is because the (1)
> > apparently also messes up the software breakpoints state in gdb,
> > and that makes gdb to not to temporary remove the breakpoint
> > in do_init_module once the execution is resumed, and then
> > the guest starts executing garbage (bytes after 'int3' instruction).
> >
> > The second issue is that (2), aka the big hammer isn't really needed.
> > GDB does have (maybe this is recent command?) a 'remove-symbol-file'
> > command.
> >
> > So it is possible to do remove-symbol-file/add-symbol-file'
> > on known module reload instead of reloading everything.
> >
> > But this has a small issue. The issue is that such known module
> > was already reloaded, so all int3 instructions that gdb placed into
> > it are already gone, so sofware breakpoints placed to it won't work
> > This is what the patch that Paulo sent fixes.
> >
> > However it is even better to create another hidden breakpoint on module
> > removal
> > path (I used free_module for that) and unload the symbols there.
> > This allows the gdb to cleanly remove all software breakpoints
> > in that module, show them again as pending, and even re-install
> > them again once the module is loaded again.
> >
> > So those are the two changes I made:
> >
> > 1. I added a breakpoint on module removal which also does
> > a. disable all breakpoints
> > b. unload the module's symbols
> > c. enable all breakpoints
> >
> > 2. On module load I also do
> > a. disable all breakpoints
> > b. load the module's symbols
> > c. enable all breakpoints
> >
> >
> > There is still an issue that both 'load' and 'unload' breakpoint
> > can hit more that once.
> > This happens because these are software breakpoints and
> > load/unload code in the kernel is executed with interrupts enabled.
> >
> > So what is happening is that once the hidden breakpoint is hit, gdb script
> > attached to it is done, and VM is resumed, gdb does more or less this:
> >
> > a. remove the breakpoint
> > b. do a single step
> > c. re-install the breakpoint.
> >
> > However the single step more often than not, lands us into an interrupt
> > handler, and so once the handler is finished we end up re-executing the
> > instruction on which breakpoint was set.
> > On a single vCPU it works more or less (with several tries) on my machine,
> > but with many vCPUs, I can end up in live lock like state
> > when the above prevents the VM from progressing.
> >
> > I think we can fix this on kvm side by not injecting interrupts
> > when single step is done.
>
> Thank you so much for this detailed description, very much appreciated!
>
> > In fact the below patch works for me,
> > Not only it fixes the live lock but makes these hidden breakpoints
> > hit only once in my testing.
> >
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index eec62c0dafc36..8b7a4e27bcf66 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -8501,6 +8501,12 @@ static void inject_pending_event(struct kvm_vcpu
> > *vcpu, bool *req_immediate_exit
> > goto busy;
> > }
> >
> > + /*
> > + * Don't inject interrupts while single stepping to make guest
> > debug easier
> > + */
> > + if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP)
> > + can_inject = false;
> > +
> >
> > With this patch lx-symbols works almost perfectly for me (on AMD).
>
> I'll try this patch!
Note that the patch to disable interrupt injection while single
stepping is wrong, due to some whatever mistake I made while rebasing things.
I attached an updated version.
With it (and the patch to lx-symbols itself which should be applied to the
guest kernel),
I was able to run lx-symbols very well on both intel and AMD.
Note that I compile the guest kernel in the guest and install, but then I copy
it
to the host, and I run the gdb from there. This is my debug.sh script:
vm adm hmp "gdbserver tcp::2345"
gdb ~/Kernel/vm-fedora/src/vmlinux \
-ex "target remote :2345" \
-ex "cd /home/mlevitsk/Kernel/vm-fedora/src" \
-ex "lx-symbols" \
-ex "cont"
Note that the debug must start after guest kernel was loaded.
Best regards,
Maxim Levitsky
>
> > The only remaing issue (that might be not easy to fix)
> > is that I still can't place breakpoints in __init module code.
> > That code is physically removed from the kernel once the module is done
> > loading,
> > and it seems that its debug info isn't correct to even make a hardware
> > breakpoint work on it.
> > (gdb shows very small addresses for these symbols).
> >
> > As for why this doesn't work for you I have 3 theories:
> >
> > 1. The whole 'reload symbols on breakpoint' is forbidden
> > by gdb in the manual, that is one of the reasons that I had
> > to disable software breakpoints prior to doing this.
> > There might be other things that break in different gdb versions.
> > I don't see a way to make it work without doing this.
>
> Maybe this can also be the reason why gdb 10 doesn't work for me.
> I should investigate further.
>
> > 2. Maybe you test that on Intel and something is broken there
> > on KVM level (I tested only AMD).
>
> Yes, I'm on Intel.
>
> > 3. Maybe you debug a nested guest? I haven't tested if guest debug
> > works fine in this configuration.
>
> Nope, I'm not debugging nested guest in that case.
>
> Thanks for your help,
> Stefano
>
From c0fe19355f5c99bd8d1e6c5345fb32f259be8131 Mon Sep 17 00:00:00 2001
From: Maxim Levitsky <[email protected]>
Date: Sun, 28 Feb 2021 23:52:08 +0200
Subject: [PATCH 1/2] gdb: rework gdb debug script
Signed-off-by: Maxim Levitsky <[email protected]>
---
scripts/gdb/linux/symbols.py | 96 ++++++++++++++++++++++++++----------
1 file changed, 71 insertions(+), 25 deletions(-)
diff --git a/scripts/gdb/linux/symbols.py b/scripts/gdb/linux/symbols.py
index 1be9763cf8bb2..13f21afc2059d 100644
--- a/scripts/gdb/linux/symbols.py
+++ b/scripts/gdb/linux/symbols.py
@@ -17,6 +17,24 @@ import re
from linux import modules, utils
+def save_state():
+ breakpoints = []
+ if hasattr(gdb, 'breakpoints') and not gdb.breakpoints() is None:
+ for bp in gdb.breakpoints():
+ breakpoints.append({'breakpoint': bp, 'enabled': bp.enabled})
+ bp.enabled = False
+
+ show_pagination = gdb.execute("show pagination", to_string=True)
+ pagination = show_pagination.endswith("on.\n")
+ gdb.execute("set pagination off")
+
+ return {"breakpoints":breakpoints, "show_pagination": show_pagination}
+
+def load_state(state):
+ for breakpoint in state["breakpoints"]:
+ breakpoint['breakpoint'].enabled = breakpoint['enabled']
+ gdb.execute("set pagination %s" % ("on" if state["show_pagination"] else "off"))
+
if hasattr(gdb, 'Breakpoint'):
class LoadModuleBreakpoint(gdb.Breakpoint):
@@ -30,26 +48,38 @@ if hasattr(gdb, 'Breakpoint'):
module_name = module['name'].string()
cmd = self.gdb_command
+ # module already loaded, false alarm
+ if module_name in cmd.loaded_modules:
+ return False
+
# enforce update if object file is not found
cmd.module_files_updated = False
# Disable pagination while reporting symbol (re-)loading.
# The console input is blocked in this context so that we would
# get stuck waiting for the user to acknowledge paged output.
- show_pagination = gdb.execute("show pagination", to_string=True)
- pagination = show_pagination.endswith("on.\n")
- gdb.execute("set pagination off")
+ state = save_state()
+ cmd.load_module_symbols(module)
+ load_state(state)
+ return False
- if module_name in cmd.loaded_modules:
- gdb.write("refreshing all symbols to reload module "
- "'{0}'\n".format(module_name))
- cmd.load_all_symbols()
- else:
- cmd.load_module_symbols(module)
+ class UnLoadModuleBreakpoint(gdb.Breakpoint):
+ def __init__(self, spec, gdb_command):
+ super(UnLoadModuleBreakpoint, self).__init__(spec, internal=True)
+ self.silent = True
+ self.gdb_command = gdb_command
- # restore pagination state
- gdb.execute("set pagination %s" % ("on" if pagination else "off"))
+ def stop(self):
+ module = gdb.parse_and_eval("mod")
+ module_name = module['name'].string()
+ cmd = self.gdb_command
+ if not module_name in cmd.loaded_modules:
+ return False
+
+ state = save_state()
+ cmd.unload_module_symbols(module)
+ load_state(state)
return False
@@ -64,8 +94,9 @@ lx-symbols command."""
module_paths = []
module_files = []
module_files_updated = False
- loaded_modules = []
+ loaded_modules = {}
breakpoint = None
+ ubreakpoint = None
def __init__(self):
super(LxSymbols, self).__init__("lx-symbols", gdb.COMMAND_FILES,
@@ -129,21 +160,32 @@ lx-symbols command."""
filename=module_file,
addr=module_addr,
sections=self._section_arguments(module))
+
gdb.execute(cmdline, to_string=True)
- if module_name not in self.loaded_modules:
- self.loaded_modules.append(module_name)
+ self.loaded_modules[module_name] = {"module_file": module_file,
+ "module_addr": module_addr}
else:
gdb.write("no module object found for '{0}'\n".format(module_name))
+ def unload_module_symbols(self, module):
+ module_name = module['name'].string()
+
+ module_file = self.loaded_modules[module_name]["module_file"]
+ module_addr = self.loaded_modules[module_name]["module_addr"]
+
+ gdb.write("unloading @{addr}: {filename}\n".format(
+ addr=module_addr, filename=module_file))
+ cmdline = "remove-symbol-file {filename}".format(
+ filename=module_file)
+
+ gdb.execute(cmdline, to_string=True)
+ del self.loaded_modules[module_name]
+
+
def load_all_symbols(self):
gdb.write("loading vmlinux\n")
- # Dropping symbols will disable all breakpoints. So save their states
- # and restore them afterward.
- saved_states = []
- if hasattr(gdb, 'breakpoints') and not gdb.breakpoints() is None:
- for bp in gdb.breakpoints():
- saved_states.append({'breakpoint': bp, 'enabled': bp.enabled})
+ state = save_state()
# drop all current symbols and reload vmlinux
orig_vmlinux = 'vmlinux'
@@ -153,15 +195,14 @@ lx-symbols command."""
gdb.execute("symbol-file", to_string=True)
gdb.execute("symbol-file {0}".format(orig_vmlinux))
- self.loaded_modules = []
+ self.loaded_modules = {}
module_list = modules.module_list()
if not module_list:
gdb.write("no modules found\n")
else:
[self.load_module_symbols(module) for module in module_list]
- for saved_state in saved_states:
- saved_state['breakpoint'].enabled = saved_state['enabled']
+ load_state(state)
def invoke(self, arg, from_tty):
self.module_paths = [os.path.expanduser(p) for p in arg.split()]
@@ -177,8 +218,13 @@ lx-symbols command."""
if self.breakpoint is not None:
self.breakpoint.delete()
self.breakpoint = None
- self.breakpoint = LoadModuleBreakpoint(
- "kernel/module.c:do_init_module", self)
+ self.breakpoint = LoadModuleBreakpoint("kernel/module.c:do_init_module", self)
+
+ if self.ubreakpoint is not None:
+ self.ubreakpoint.delete()
+ self.ubreakpoint = None
+ self.ubreakpoint = UnLoadModuleBreakpoint("kernel/module.c:free_module", self)
+
else:
gdb.write("Note: symbol update on module loading not supported "
"with this gdb version\n")
--
2.26.2
From 8619f7c836156b5704c32aca7b7808ef344989c3 Mon Sep 17 00:00:00 2001
From: Maxim Levitsky <[email protected]>
Date: Wed, 3 Mar 2021 13:05:34 +0200
Subject: [PATCH 2/2] KVM: x86: Guest debug: don't inject interrupts while
single stepping
Signed-off-by: Maxim Levitsky <[email protected]>
---
arch/x86/kvm/x86.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b140fce829286..ab3a598520ff4 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8542,6 +8542,12 @@ static void inject_pending_event(struct kvm_vcpu *vcpu, bool *req_immediate_exit
goto busy;
}
+ /*
+ * Don't inject interrupts while single stepping to make guest debug easier
+ */
+ if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP)
+ return;
+
/*
* Finally, inject interrupt events. If an event cannot be injected
* due to architectural conditions (e.g. IF=0) a window-open exit
--
2.26.2