paolosev added inline comments.
================
Comment at: lldb/source/Plugins/ObjectFile/wasm/ObjectFileWasm.cpp:49
+ lldb::offset_t initial_offset = *offset_ptr;
+ uint64_t value = section_header_data.GetULEB128(offset_ptr);
+ if (*offset_ptr == initial_offset || value > 127)
----------------
clayborg wrote:
> Is it ok if we consume more than 1 byte here? What is the offset points to a
> larger ULEB, are we ok with advancing the offset by multiple bytes or should
> we back it up and return llvm::None?
>
> This might be a good candidate to add to DataExtractor directly as:
>
> ```
> /// Extract a ULEB128 number with a specified max value. If the extracted
> value exceeds
> /// "max_value" the offset will be left unchanged and llvm::None will be
> returned.
> llvm::Optional<uint8_t> DataExtractor::GetULEB128(uint64_t *offset_ptr,
> uint64_t max_value);
> ```
>
> There are many places where we extract a uint64_t, but only need a uint16_t
> (like in the DWARF parser where all DW_TAG_XXXX, DW_AT_XXX and DW_FORM_XXX
> values must only be uint16_t values but are encoded as ULEB128 values. So
> this could be used elsewhere if we do put it into DataExtractor.
Modified in DataExtractor, as suggested. However we need to return a
`llvm::Optional<uint64_t>`, which can be not ideal because the result could
need to be casted to another integer type.
We could also male `DataExtractor::GetULEB128` templatized on the integer type,
getting `max_value` as `std::numeric_limits<int>::max()`, but I don't want to
overly complicate the code.
================
Comment at: lldb/source/Plugins/ObjectFile/wasm/ObjectFileWasm.cpp:162-163
+ // - the actual contents.
+ llvm::Optional<uint8_t> section_id =
+ GetVaruint7(section_header_data, &offset);
+ if (!section_id)
----------------
clayborg wrote:
> Is this a one byte section ID or is it a ULEB? Not sure why it would be
> encoded as a ULEB if it is always one byte? IF this really is just a one byte
> value, then replace with:
>
> ```
> uint8_t section_id = data.GetU8(&offset);
> ```
You are right, it is a one-bye section. I cannot use `data.GetU8(&offset);`
though, because it returns 0 on failure.
================
Comment at: lldb/source/Plugins/ObjectFile/wasm/ObjectFileWasm.cpp:342-343
+ section_type, // Section type.
+ sect_info.offset & 0xffffffff, // VM address.
+ sect_info.size, // VM size in bytes of this section.
+ sect_info.offset &
----------------
labath wrote:
> paolosev wrote:
> > labath wrote:
> > > Are the debug info sections actually loaded into memory for wasm? Should
> > > these be zero (that's what they are for elf)?
> > Yes, I was thinking that the debug sections should be loaded into memory;
> > not sure how this works for ELF, how does the debugger find the debug info
> > in that case?
> Are you referring to the load-from-memory, or load-from-file scenario?
> Normally, the debug info is not loaded into memory, and if lldb is loading
> the module from a file, then the debug info is loaded from the file (and we
> use the "file offset" field to locate them). Loading files from memory does
> not work in general (because we can't find the whole file there -- e.g.,
> section headers are missing). The only case it does work is in case of jitted
> files. I'm not 100% sure how that works, but given that there is no
> `if(memory)` block in the section creation code, those sections must too get
> `vm size = 0`.
>
> In practice, I don't think it does not matter that much what you put here --
> I expect things will mostly work regardless. I am just trying to make this
> consistent in some way. If these sections are not found in the memory at the
> address which you set here (possibly adjusted by the load bias) then I think
> it makes sense to set `vm_size=vm_addr=0`. If they are indeed present there,
> then setting it to those values is perfectly fine.
Modified, but I am not sure I completely understood the difference of file
addresses and vm addresses.
In the case of Wasm, we can have two cases:
a) Wasm module contains executable code and also DWARF data embedded in DWARF
sections
b) Wasm module contains code and has a custom section that points to a
separated Wasm module that only contains DWARF sections
The file of Wasm modules that contains code should never be loaded directly by
LLDB; LLDB should use GDB-remote to connect to the Wasm engine that loaded the
module and retrieve the module content from the engine.
But when the DWARF data has been stripped into a separate Wasm file, that file
should be directly loaded by LLDB in order to load the debug sections.
So, if I am not wrong, only in the first case we should assume that the debug
info is loaded into memory, and have vm_size != 0?
================
Comment at: lldb/source/Plugins/ObjectFile/wasm/ObjectFileWasm.cpp:382
+
+ DecodeSections(load_address);
+
----------------
labath wrote:
> paolosev wrote:
> > labath wrote:
> > > This is strange.. I wouldn't expect that the section decoding logic
> > > should depend on the actual address that the object is loaded in memory.
> > > Can you explain the reasoning here?
> > There is a reason for this: DecodeNextSection() calls:
> > ```ReadImageData(offset, size)```
> > and when the debug info sections are embedded in the wasm module (not
> > loaded from a separated symbol file), ReadImageData() calls
> > Process::ReadMemory() that, for a GDB remote connection, goes to
> > ProcessGDBRemote::DoReadMemory(). Here I pass the offset because it also
> > represents the specific id of the module loaded in the target debuggee
> > process, as explained in the comment above.
> What you say about ReadMemory makes sense, but it's not clear to me why you
> couldn't use the value in `m_memory_addr` for this. For in-memory object file
> this value should contain the actual address the object file was loaded from
> (which, I would expect, will include the `module_id` business), and so you
> wouldn't need the dynamic loader address in order to locate it. I believe
> this is how the other object file plugins do their in-memory loading..
Good point! Changed.
================
Comment at: lldb/source/Plugins/ObjectFile/wasm/ObjectFileWasm.cpp:34
+/// Checks whether the data buffer starts with a valid Wasm module header.
+static bool ValidateModuleHeader(const DataBufferSP &data_sp) {
+ if (!data_sp || data_sp->GetByteSize() < kWasmHeaderSize)
----------------
clayborg wrote:
> I typically would put "data_sp" into a DataExtractor and extract a uint32_t
> and then check the decoded value with something like:
>
> ```
> static bool ValidateModuleHeader(DataExtractor &data, uint64_t *offset_ptr) {
> auto magic = data.GetU32(offset_ptr);
> if (magic == WASM_MAGIC)
> return true;
> if (magic == WASM_CIGAM) {
> // Set byte order in DataExtractor
> data.SetByteOrder(data.GetByteOrder() == eByteOrderBig ? eByteOrderLittle
> : eByteOrderBig);
> return true;
> }
> return false;
> }
> ```
>
> This function expects a DataExtractor to be passed in that has "data_sp"
> inside of it with the host endian set as the byte order. It will set the byte
> order correctly. It also expects to have two uint32_t macros defined:
> WASM_MAGIC and WASM_CIGAM. These contain the non byte swapped and the byte
> swapped magic values. Easy to replace those with real definitions from else
> where (llvm::file_magic::wasm_object? Not sure of the type of this though,
> seemed like a StringRef).
Not sure about this... WebAssembly is always little-endian and a DataExtractor
is not really needed here.
I made sure to set the byte order where the DataExtractor is created in
`ReadImageData`.
================
Comment at: lldb/source/Plugins/ObjectFile/wasm/ObjectFileWasm.cpp:59
+/// Reads a LEB128 variable-length unsigned integer, limited to 32 bits.
+static llvm::Optional<uint32_t> GetVaruint32(DataExtractor
§ion_header_data,
+ lldb::offset_t *offset_ptr) {
----------------
clayborg wrote:
> remove if we add:
> ```
> llvm::Optional<uint8_t> DataExtractor::GetULEB128(uint64_t *offset_ptr,
> uint64_t max_value);
> ```
Actually, section_id is encoded as a byte not as a varuint7, so I modified the
code accordingly, with `llvm::Optional<uint8_t> GetByte(DataExtractor &,
lldb::offset_t *)`.
Even this could be moved to DataExtractor, maybe?
================
Comment at: lldb/source/Plugins/ObjectFile/wasm/ObjectFileWasm.cpp:187-195
+ const uint8_t *name_bytes = section_header_data.PeekData(offset,
*name_len);
+ // If a custom section has a name longer than the allocated buffer or
longer
+ // than the data left in the image, ignore this section.
+ if (!name_bytes)
+ return false;
+
+ llvm::StringRef sect_name(reinterpret_cast<const char *>(name_bytes),
----------------
clayborg wrote:
> All these lines can use the GetCStr with a length:
> ```
> ConstString sect_name(data.GetCStr(&offset, *name_len));
> if (!sect_name)
> return false;
> ```
I had tried to use `DataExtractor::GetCStr()` but it doesn't work because it
wants null-terminated strings, and this is not the case in Wasm, where strings
are encoded as len (varuint32) followed by an array of len bytes that represent
UTF8 chars (https://webassembly.github.io/spec/core/binary/values.html#names).
================
Comment at: lldb/test/Shell/ObjectFile/wasm/basic.yaml:21-79
+Sections:
+ - Type: TYPE
+ Signatures:
+ - Index: 0
+ ParamTypes:
+ - I32
+ ReturnTypes:
----------------
labath wrote:
> Could you remove the sections which are not relevant for this test? This
> makes it easier to see the correspondence between the yaml and the test
> expectations..
Ok. What can be confusing is that Wasm modules (almost) always have at least a
few standard sections (MEMORY, GLOBAL, CODE, ...) but these sections can be
ignored by LLDB, so also by these tests.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D71575/new/
https://reviews.llvm.org/D71575
_______________________________________________
lldb-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits