[Lldb-commits] [PATCH] D71575: [LLDB] Add ObjectFileWasm plugin for WebAssembly debugging

Paolo Severini via Phabricator via lldb-commits Wed, 08 Jan 2020 11:09:29 -0800

paolosev added inline comments.


================
Comment at: lldb/source/Plugins/ObjectFile/wasm/ObjectFileWasm.cpp:49
+  lldb::offset_t initial_offset = *offset_ptr;
+  uint64_t value = section_header_data.GetULEB128(offset_ptr);
+  if (*offset_ptr == initial_offset || value > 127)
----------------
clayborg wrote:
> Is it ok if we consume more than 1 byte here?  What is the offset points to a 
> larger ULEB, are we ok with advancing the offset by multiple bytes or should 
> we back it up and return llvm::None?
> 
> This might be a good candidate to add to DataExtractor directly as:
> 
> ```
> /// Extract a ULEB128 number with a specified max value. If the extracted 
> value exceeds 
> /// "max_value" the offset will be left unchanged and llvm::None will be 
> returned.
> llvm::Optional<uint8_t> DataExtractor::GetULEB128(uint64_t *offset_ptr, 
> uint64_t max_value);
> ```
> 
> There are many places where we extract a uint64_t, but only need a uint16_t 
> (like in the DWARF parser where all DW_TAG_XXXX, DW_AT_XXX and DW_FORM_XXX 
> values must only be uint16_t values but are encoded as ULEB128 values. So 
> this could be used elsewhere if we do put it into DataExtractor.
Modified in DataExtractor, as suggested. However we need to return a 
`llvm::Optional<uint64_t>`, which can be not ideal because the result could 
need to be casted to another integer type.
We could also male `DataExtractor::GetULEB128` templatized on the integer type, 
getting `max_value` as `std::numeric_limits<int>::max()`, but I don't want to 
overly complicate the code.



================
Comment at: lldb/source/Plugins/ObjectFile/wasm/ObjectFileWasm.cpp:162-163
+  // - the actual contents.
+  llvm::Optional<uint8_t> section_id =
+      GetVaruint7(section_header_data, &offset);
+  if (!section_id)
----------------
clayborg wrote:
> Is this a one byte section ID or is it a ULEB? Not sure why it would be 
> encoded as a ULEB if it is always one byte? IF this really is just a one byte 
> value, then replace with:
> 
> ```
> uint8_t section_id = data.GetU8(&offset);
> ```
You are right, it is a one-bye section. I cannot use `data.GetU8(&offset);` 
though, because it returns 0 on failure. 


================
Comment at: lldb/source/Plugins/ObjectFile/wasm/ObjectFileWasm.cpp:342-343
+                        section_type,                  // Section type.
+                        sect_info.offset & 0xffffffff, // VM address.
+                        sect_info.size, // VM size in bytes of this section.
+                        sect_info.offset &
----------------
labath wrote:
> paolosev wrote:
> > labath wrote:
> > > Are the debug info sections actually loaded into memory for wasm? Should 
> > > these be zero (that's what they are for elf)?
> > Yes, I was thinking that the debug sections should be loaded into memory; 
> > not sure how this works for ELF, how does the debugger find the debug info 
> > in that case?
> Are you referring to the load-from-memory, or load-from-file scenario? 
> Normally, the debug info is not loaded into memory, and if lldb is loading 
> the module from a file, then the debug info is loaded from the file (and we 
> use the "file offset" field to locate them). Loading files from memory does 
> not work in general (because we can't find the whole file there -- e.g., 
> section headers are missing). The only case it does work is in case of jitted 
> files. I'm not 100% sure how that works, but given that there is no 
> `if(memory)` block in the section creation code, those sections must too get 
> `vm size = 0`.
> 
> In practice, I don't think it does not matter that much what you put here -- 
> I expect things will mostly work regardless. I am just trying to make this 
> consistent in some way. If these sections are not found in the memory at the 
> address which you set here (possibly adjusted by the load bias) then I think 
> it makes sense to set `vm_size=vm_addr=0`. If they are indeed present there, 
> then setting it to those values is perfectly fine.
Modified, but I am not sure I completely understood the difference of file 
addresses and vm addresses.

In the case of Wasm, we can have two cases:
a) Wasm module contains executable code and also DWARF data embedded in DWARF 
sections
b) Wasm module contains code and has a custom section that points to a 
separated Wasm module that only contains DWARF sections

The file of Wasm modules that contains code should never be loaded directly by 
LLDB; LLDB should use GDB-remote to connect to the Wasm engine that loaded the 
module and retrieve the module content from the engine.
But when the DWARF data has been stripped into a separate Wasm file, that file 
should be directly loaded by LLDB in order to load the debug sections.
So, if I am not wrong, only in the first case we should assume that the debug 
info is loaded into memory, and have vm_size != 0?


================
Comment at: lldb/source/Plugins/ObjectFile/wasm/ObjectFileWasm.cpp:382
+
+  DecodeSections(load_address);
+
----------------
labath wrote:
> paolosev wrote:
> > labath wrote:
> > > This is strange.. I wouldn't expect that the section decoding logic 
> > > should depend on the actual address that the object is loaded in memory. 
> > > Can you explain the reasoning here?
> > There is a reason for this: DecodeNextSection() calls:
> > ```ReadImageData(offset, size)```
> > and when the debug info sections are embedded in the wasm module (not 
> > loaded from a separated symbol file), ReadImageData() calls 
> > Process::ReadMemory() that, for a GDB remote connection, goes to 
> > ProcessGDBRemote::DoReadMemory(). Here I pass the offset because it also 
> > represents the specific id of the module loaded in the target debuggee 
> > process, as explained in the comment above.
> What you say about ReadMemory makes sense, but it's not clear to me why you 
> couldn't use the value in `m_memory_addr` for this. For in-memory object file 
> this value should contain the actual address the object file was loaded from 
> (which, I would expect, will include the `module_id` business), and so you 
> wouldn't need the dynamic loader address in order to locate it. I believe 
> this is how the other object file plugins do their in-memory loading..
Good point! Changed.


================
Comment at: lldb/source/Plugins/ObjectFile/wasm/ObjectFileWasm.cpp:34
+/// Checks whether the data buffer starts with a valid Wasm module header.
+static bool ValidateModuleHeader(const DataBufferSP &data_sp) {
+  if (!data_sp || data_sp->GetByteSize() < kWasmHeaderSize)
----------------
clayborg wrote:
> I typically would put "data_sp" into a DataExtractor and extract a uint32_t 
> and then check the decoded value with something like:
> 
> ```
> static bool ValidateModuleHeader(DataExtractor &data, uint64_t *offset_ptr) {
>   auto magic = data.GetU32(offset_ptr);
>   if (magic == WASM_MAGIC)
>     return true;
>   if (magic == WASM_CIGAM) {
>     // Set byte order in DataExtractor
>     data.SetByteOrder(data.GetByteOrder() == eByteOrderBig ? eByteOrderLittle 
> : eByteOrderBig);
>     return true;
>   }
>   return false;
> }
> ```
> 
> This function expects a DataExtractor to be passed in that has "data_sp" 
> inside of it with the host endian set as the byte order. It will set the byte 
> order correctly. It also expects to have two uint32_t macros defined: 
> WASM_MAGIC and WASM_CIGAM. These contain the non byte swapped and the byte 
> swapped magic values. Easy to replace those with real definitions from else 
> where (llvm::file_magic::wasm_object? Not sure of the type of this though, 
> seemed like a StringRef).
Not sure about this... WebAssembly is always little-endian and a DataExtractor 
is not really needed here. 
I made sure to set the byte order where the DataExtractor is created in 
`ReadImageData`.


================
Comment at: lldb/source/Plugins/ObjectFile/wasm/ObjectFileWasm.cpp:59
+/// Reads a LEB128 variable-length unsigned integer, limited to 32 bits.
+static llvm::Optional<uint32_t> GetVaruint32(DataExtractor 
&section_header_data,
+                                             lldb::offset_t *offset_ptr) {
----------------
clayborg wrote:
> remove if we add:
> ```
> llvm::Optional<uint8_t> DataExtractor::GetULEB128(uint64_t *offset_ptr, 
> uint64_t max_value);
> ```
Actually, section_id is encoded as a byte not as a varuint7, so I modified the 
code accordingly, with `llvm::Optional<uint8_t> GetByte(DataExtractor &, 
lldb::offset_t *)`.
Even this could be moved to DataExtractor, maybe?


================
Comment at: lldb/source/Plugins/ObjectFile/wasm/ObjectFileWasm.cpp:187-195
+    const uint8_t *name_bytes = section_header_data.PeekData(offset, 
*name_len);
+    // If a custom section has a name longer than the allocated buffer or 
longer
+    // than the data left in the image, ignore this section.
+    if (!name_bytes)
+      return false;
+
+    llvm::StringRef sect_name(reinterpret_cast<const char *>(name_bytes),
----------------
clayborg wrote:
> All these lines can use the GetCStr with a length:
> ```
> ConstString sect_name(data.GetCStr(&offset, *name_len));
> if (!sect_name)
>   return false;
> ```
I had tried to use `DataExtractor::GetCStr()` but it doesn't work because it 
wants null-terminated strings, and this is not the case in Wasm, where strings 
are encoded as len (varuint32) followed by an array of len bytes that represent 
UTF8 chars (https://webassembly.github.io/spec/core/binary/values.html#names).


================
Comment at: lldb/test/Shell/ObjectFile/wasm/basic.yaml:21-79
+Sections:
+  - Type:            TYPE
+    Signatures:
+      - Index:           0
+        ParamTypes:
+          - I32
+        ReturnTypes:
----------------
labath wrote:
> Could you remove the sections which are not relevant for this test? This 
> makes it easier to see the correspondence between the yaml and the test 
> expectations..
Ok. What can be confusing is that Wasm modules (almost) always have at least a 
few standard sections (MEMORY, GLOBAL, CODE, ...) but these sections can be 
ignored by LLDB, so also by these tests.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D71575/new/

https://reviews.llvm.org/D71575



_______________________________________________
lldb-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits

[Lldb-commits] [PATCH] D71575: [LLDB] Add ObjectFileWasm plugin for WebAssembly debugging

Reply via email to