[Lldb-commits] [PATCH] D140358: [lldb-vscode] Add support for disassembly view

Enrico Loparco via Phabricator via lldb-commits Tue, 17 Jan 2023 01:15:44 -0800

eloparco added inline comments.


================
Comment at: lldb/tools/lldb-vscode/lldb-vscode.cpp:2177
+  const auto max_instruction_size = g_vsc.target.GetMaximumOpcodeByteSize();
+  const auto bytes_offset = -instruction_offset * max_instruction_size;
+  auto start_addr = base_addr - bytes_offset;
----------------
clayborg wrote:
> eloparco wrote:
> > clayborg wrote:
> > > Just checked out your changes, and you are still just subtracting a value 
> > > from the start address and attempting to disassemble from memory which is 
> > > the problem. We need to take that subtracted address, and look it up as 
> > > suggested in previous code examples I posted. If you find a function to 
> > > symbol, ask those objects for their instructions. and then try to use 
> > > those. 
> > > 
> > > But basically for _any_ disassembly this is what I would recommend doing:
> > > - first resolve the "start_address" (no matter how you come up the 
> > > address) that want to disassemble into a SBAddress
> > > - check its section. If the section is valid and contains instructions, 
> > > call a function that will disassemble the address range for the section 
> > > that starts at "start_address" and ends at the end of the section. We can 
> > > call this "disassemble_code" as a function. More details on this below
> > > - If the section does not contain instructions, just read the bytes and 
> > > emit a lines like:
> > > ```
> > > 0x1000 .byte 0x12
> > > 0x1000 .byte 0x34
> > > ...
> > > ```
> > > 
> > > Now for the disassemble_code function. We know the address range for this 
> > > is in code. We then need to resolve the address passed to 
> > > "disassemble_code" into a SBAddress and ask that address for a SBFunction 
> > > or SBSymbol as I mentioned. Then we ask the SBFunction or SBSymbol for 
> > > all instructions that they contain, and then use any instructions that 
> > > fall into the range we have. If there is no SBFunction or SBSymbol, then 
> > > disassemble an instruction at a time and then see if the new address will 
> > > resolve to a function or symbol.
> > Tried my changes on a linux x86 machine and the loop `for (unsigned i = 0; 
> > i < max_instruction_size; i++) {` (L2190) takes care of the `start_address` 
> > possibly being in the middle of an instruction, so that's not a problem.  
> > The problem I faced is that it tries to read too far from `base_addr` and 
> > the `ReadMemory()` operation returns few instructions (without reaching 
> > `base_addr`). That was not happening on my macOS M1 (arm) machine. 
> > 
> > To solve, I changed the loop at L2190 to
> > ```
> > for (unsigned i = 0; i < bytes_offset; i++) {
> >     auto sb_instructions =
> >         _get_instructions_from_memory(start_addr + i, disassemble_bytes);
> > ```
> > and if `start_addr` is in `sb_instructions` we're done and can exit the 
> > loop. That worked.
> > 
> > Another similar thing that can be done is to start from `start_sbaddr` as 
> > you were saying, increment the address until a valid section is found. Then 
> > call `_get_instructions_from_memory()` passing the section start.
> > What do you think? Delegating the disassembling to `ReadMemory()` + 
> > `GetInstructions()` looks simpler to me than to manually iterate over 
> > sections and get instructions from symbols and functions.
> > Is there any shortcoming I'm not seeing?
> so your 
> ```
> for (unsigned i = 0; i < max_instruction_size; i++) { 
> ```
> disassembles and tries to make sure you make it back to the original base 
> address from the original disassemble packet? That can work but could a a bit 
> time consuming?
> 
> The main issue, as you saw on x86, is you don't know what is in memory. You 
> could have unreadable areas of memory when trying to disassemble. Also if you 
> do have good memory that does contain instructions, there can be padding in 
> between functions or even data between functions that the function accesses 
> that can't be correctly disassembled and could throw things off again. 
> 
> The memory regions are the safest way to traverse memory to see what you have 
> and would help you deal with holes in the memory. You can ask about a memory 
> region with:
> ```
> lldb::SBError SBProcess::GetMemoryRegionInfo(lldb::addr_t load_addr, 
> lldb::SBMemoryRegionInfo &region_info);
> ```
> If you ask about an invalid address, like zero on most platforms, it will 
> return a valid "region_info" with the bounds of the unreadable address range 
> filled in no read/write/execute permissions:
> ```
> (lldb) script
> Python Interactive Interpreter. To exit, type 'quit()', 'exit()' or Ctrl-D.
> >>> region = lldb.SBMemoryRegionInfo()
> >>> err = lldb.process.GetMemoryRegionInfo(0, region)
> [0x0000000000000000-0x0000000100000000 ---]
> ```
> So you could use the memory region info to your advantage here. If you have 
> execute permissions, disassemble as instructions, and if you don't emit 
> ".byte 0xXX" for each byte. If there are no permissions, you can emit some 
> other string like "0x00000000: <invalid memory>". 
> 
> That being said, even when you do find an executable section of memory, there 
> can be different stuff there even if a section _is_ executable. For instance, 
> if we ask about the next memory region on a mac M1:
> ```
> >>> err = lldb.process.GetMemoryRegionInfo(0x0000000100000000, region)
> >>> print(region)
> [0x0000000100000000-0x0000000100004000 R-X]
> ```
> Notice that this is read + execute (you can access these via:
> ```
> region.IsReadable()
> region.IsWritable()
> region.IsExecutable()
> ```
> But at this address, this is the mach-o header which doesn't make sense to 
> try and disassemble:
> ```
> (lldb) memory read 0x0000000100000000
> 0x100000000: cf fa ed fe 0c 00 00 01 00 00 00 00 02 00 00 00  ................
> 0x100000010: 11 00 00 00 18 03 00 00 85 00 20 00 00 00 00 00  .......... .....
> (lldb) memory read -fx -s4 -c 4 0x0000000100000000
> 0x100000000: 0xfeedfacf 0x0100000c 0x00000000 0x00000002
> ```
> 0xfeedfacf is the mach-o magic bytes for little endian 64 bit mach-o files.
> ```
> (lldb) disassemble --start-address 0x0000000100000000
> a.out`_mh_execute_header:
>     0x100000000 <+0>:  .long  0xfeedfacf                ; unknown opcode
>     0x100000004 <+4>:  .long  0x0100000c                ; unknown opcode
>     0x100000008 <+8>:  udf    #0x0
>     0x10000000c <+12>: udf    #0x2
>     0x100000010 <+16>: udf    #0x11
>     0x100000014 <+20>: udf    #0x318
>     0x100000018 <+24>: .long  0x00200085                ; unknown opcode
>     0x10000001c <+28>: udf    #0x0
> ```
> 
> So this is the main reason why I would suggest just disassembling using .byte 
> or .long when we aren't in a function or symbol.
> 
understood, so in that case we also have to do the same for 
`_handle_disassemble_positive_offset`, because even there, when we use 
`ReadInstructions()`, we could have some unreadable areas in the middle and 
`ReadInstructions()` would stop prematurely when running into one of them, 
resulting in less instructions returned than expected


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D140358/new/

https://reviews.llvm.org/D140358

_______________________________________________
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits

[Lldb-commits] [PATCH] D140358: [lldb-vscode] Add support for disassembly view

Reply via email to