[Lldb-commits] [PATCH] D113098: [lldb] (Partially) enable formatting of utf strings before the program is started

Jim Ingham via Phabricator via lldb-commits Mon, 15 Nov 2021 17:00:49 -0800

jingham requested changes to this revision.
jingham added a comment.
This revision now requires changes to proceed.


It makes sense for the string formatters to try reading from the target if 
there's no process.  So the general idea is fine.

But one of the other major differences between the Target before a process has 
been launched and attached to; and its state after attachment; is that before 
running there isn't a unique mapping from a scalar address (lldb::addr_t) to a 
specific byte in a specific Object File.  For instance, on the systems where 
lldb can figure it out the dependent libraries before execution, lldb will 
determine & load into the Target not just the binary handed to it by the "file" 
command, but also the closure of that binary's direct dependents.  That's 
handy, it means that you can inspect disassembly, set breakpoints, etc. seeing 
all the libraries your binary depends on before launch.  However, on many 
systems libraries aren't given unique load addresses.  They generally just get 
set to load at 0x0, and the loader is the one that gives them all unique slots 
in the memory space when the process runs.  So Modules in the Target might and 
often do overlap each other in the "before running" address space.

When you are asking a process for memory, there's only one value at any given 
address, so all those API's can take an lldb::addr_t, the Process will always 
be able to unambiguously turn that back into either a plain address (if the 
memory is on the heap or stack) or a section relative address if the address is 
in TEXT/DATA, etc. of some loaded binary.  But when you start asking the Target 
questions, you can't rely on this fact.

The lldb_private::Address class handles that by recording the address as 
section in a Module + offset, inherently picking the right binary.  So using 
Addresses we can always record the user's intentions correctly.  For instance, 
if you get the Address from a ValueObject representing a global variable which 
is in the DATA section of a target, that Address will record both the data 
section, and the variable's offset in that sections.   But for that to work, 
you have to be careful to preserve the Address throughout the process, and 
never go back & forth from lldb_private::Address to lldb::addr_t until you 
actually have to request bytes of something (process, file, etc).  Otherwise 
you might start out trying to read a string in library A, but end up reading 
the memory for it from library B...

Preserving the Address as long as is practicable is actually a general good 
practice in lldb.  Even if you have a live process, when you go from 
section-relative Address to addr_t and back to Address, we have to re-lookup 
that address in the section load list, which is inefficient.

You make this mistake a few places in this patch, which I've pointed out in the 
detailed comments.



================
Comment at: lldb/include/lldb/Target/Target.h:1028
+  ///
+  /// This function will read a cache page at a time until a NULL string
+  /// terminator is found. It will stop reading if an aligned sequence of NULL
----------------
ReadCStringFromMemory (both flavors) set "force_live_memory" to be true.  Note 
that doesn't mean "don't read from the target" it means "always read from the 
process first."  It does fall back to the Target if there's no process, AND if 
your Address is section relative.

OTOH, you make it a default argument and then set it to "false" - the opposite 
behavior.  Is there a reason for that?  That seems confusing.

force_live_memory is really an efficiency thing.  It will only make a 
difference if the data to be read is in a read-only section of a binary, in 
which case it will read from there.  lldb always used to prefer reading locally 
if it can, but if someone has been able to flip permissions and overwrite 
read-only DATA through some kind of attack, you wouldn't be able to see that if 
this is the default.  So in actual fact, we pass force_live_memory -> true most 
of the places we use these Target Memory Reading API's.

So it seems better to follow ReadCStringFromMemory and just force this to true.


================
Comment at: lldb/source/Plugins/Language/CPlusPlus/CxxStringTypes.cpp:58
   StringPrinter::ReadStringAndDumpToStreamOptions options(valobj);
   options.SetLocation(valobj_addr);
+  options.SetTargetSP(valobj.GetTargetSP());
----------------
This should take some form of lldb_private::Address not an lldb::addr_t or it 
won't work pre-run.  See the overall comment for details.


================
Comment at: lldb/source/Target/Target.cpp:1911
+  size_t total_bytes_read = 0;
+  if (dst && max_bytes && type_width && max_bytes >= type_width) {
+    // Ensure a null terminator independent of the number of bytes that is
----------------
Process::ReadStringFromMemory is a pretty old function, and hasn't been fully 
converted to llvm style, which prefers to check things like `dst != nullptr` up 
front and do an early return if anything fails.  That reduces indentation, and 
spaces are a precious commodity if you restrict yourself to 80 characters on a 
line.

Also, the error message was originally pretty useless, but there's no reason to 
propagate that now that you've touched it ;-)


================
Comment at: lldb/source/Target/Target.cpp:1922
+
+    addr_t curr_addr = addr.GetLoadAddress(this);
+    Address address = addr;
----------------
This is wrong.  If the incoming address was section relative, you've thrown 
that information away.  Just pass Target::ReadMemory the full Address (addr), 
it will know what to do with it.  And you can use:

addr.Slide(bytes_read);

instead of lines 1951-1952 to update the address.


================
Comment at: lldb/source/Target/Target.cpp:1924
+    Address address = addr;
+    const size_t cache_line_size = 512;
+    char *curr_dst = dst;
----------------
This is a seemingly arbitrary choice.  

From what I can tell you are copying this from GetCStringFromMemory, so the 
decision to do this isn't yours...

IIUC the whole point of this paging so we don't  read a big chunk of memory and 
then find the terminator a couple of characters in.  The optimal size for that 
doesn't have to line up with the process cache size, but I don't see why you 
wouldn't use that if it was handy.

 Anyway, having this here with no comment is confusing.  Either repeat the 
original justification or get the process cache line size if there's a process.


================
Comment at: lldb/test/API/lang/cpp/char8_t/TestCxxChar8_t.py:25
+        # Make sure the variables can be printed without a running process.
+        self.expect("target variable a", substrs=["char8_t", "0x61 u8'a'"])
+        self.expect("target variable ab",
----------------
It's a shame we don't have SBTarget.GetValueForVariablePath() - the equivalent 
of SBFrame.GetValueForVariablePath.  If we did we could make a global variable 
version of lldbtest.expect_var_path, which would make this test more precise.  
But we don't...


================
Comment at: lldb/test/API/lang/cpp/char8_t/TestCxxChar8_t.py:29
+        self.expect("target variable abc", substrs=["char8_t[9]", 'u8"你好"'])
+        # TODO: Make this work through expression evaluation as well
+
----------------
Rather than a TODO, it's better to put in a test here, then expect it to fail.  
That way if someone comes upon this comment later on, they won't have to guess 
what didn't work for you.  And if somebody fixes this while doing something 
else, the unexpected pass will let them know.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D113098/new/

https://reviews.llvm.org/D113098

_______________________________________________
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits

[Lldb-commits] [PATCH] D113098: [lldb] (Partially) enable formatting of utf strings before the program is started

Reply via email to