[Lldb-commits] [lldb] [LLDB][NFC] Calculate the region size of an in memory image if size isn't specified (PR #123148)

Pavel Labath via lldb-commits Thu, 23 Jan 2025 02:31:24 -0800

labath wrote:

> > The reason this code looks the way it does is that we've had crashes when 
> > trying to read corrupted elf files from memory, where we load the size of 
> > the elf file from the process memory, find that out that it's size is 
> > 935872395 GB, try to allocate a host buffer of that size, and then explode 
> > when the allocation failed.
> 
> How do you load the size of the ELF file from process memory? Are you saying 
> the memory region information was corrupted in such cases?


You can't load the ELF file size from process memory -- but that didn't stop 
the code from trying. :P (I.e., it was wrong/buggy)

IIRC (this was ~ten years ago), what happened was that the code looked at 
section header members in the ELF header (e_shoff, e_shentsize, e_shnum, which 
are present and filled in even when  the headers themselves are not mapped in), 
and then tried to allocate enough memory to hold everything from the beginning 
of the file to the end. It didn't use the memory region information to validate 
this data (if it did, it would have found out that the memory is not there), it 
just took the data at face value. If those fields contained garbage, it would 
crash when attempting to allocate exabytes of memory.



> They are not present in memory and never have been.

Not entirely true (but I get the idea). Section headers *don't have to be* 
present in memory. They also *usually* not present there. The vdso 
pseudo-module, however, always has sections headers (technically I guess it 
also doesn't need them, but every vdso I've seen has them).
```
(lldb) image dump objfile [vdso]
Dumping headers for 1 module(s).
0x5592a46ea680:   ObjectFileELF, file = '', arch = x86_64
ELF Header
e_ident[EI_MAG0   ] = 0x7f
e_ident[EI_MAG1   ] = 0x45 'E'
e_ident[EI_MAG2   ] = 0x4c 'L'
e_ident[EI_MAG3   ] = 0x46 'F'

<snip>

Program Headers
IDX  p_type          p_offset p_vaddr  p_paddr  p_filesz p_memsz  p_flags       
            p_align
==== --------------- -------- -------- -------- -------- -------- 
------------------------- --------
[ 0] PT_LOAD         00000000 00000000 00000000 000008df 000008df 00000005 
(PF_X      PF_R) 00001000
[ 1] PT_DYNAMIC      000003a0 000003a0 000003a0 00000120 00000120 00000004 (    
      PF_R) 00000008
[ 2] PT_NOTE         000004c0 000004c0 000004c0 00000054 00000054 00000004 (    
      PF_R) 00000004
[ 3] PT_GNU_EH_FRAME 00000514 00000514 00000514 00000034 00000034 00000004 (    
      PF_R) 00000004

Section Headers
IDX  name     type         flags                            addr     offset   
size     link     info     addralgn entsize  Name
==== -------- ------------ -------------------------------- -------- -------- 
-------- -------- -------- -------- -------- ====================
[ 0] 00000000 SHT_NULL     00000000 (                     ) 00000000 00000000 
00000000 00000000 00000000 00000000 00000000 
[ 1] 0000000f SHT_HASH     00000002 (      ALLOC          ) 00000120 00000120 
00000044 00000003 00000000 00000008 00000004 .hash
[ 2] 0000000b 0x6ffffff6   00000002 (      ALLOC          ) 00000168 00000168 
00000050 00000003 00000000 00000008 00000000 .gnu.hash
[ 3] 00000015 SHT_DYNSYM   00000002 (      ALLOC          ) 000001b8 000001b8 
00000120 00000004 00000001 00000008 00000018 .dynsym

etc.
```

>  So I changed the ELF code to not rely on requiring sections when the 
> PT_DYNAMIC info or PT_NOTE info points to the same data without requiring 
> section info.

:+1: 

> The first memory region is typically a page or two and contains all that we 
> want.

That depends on how the file is linked. By changing the linker flags, I can 
easily make it so that the entire code section of the binary ends up in the 
first memory region. That means the region can be arbitrarily (~gigabytes) 
large.

Also, there's really no reason why the program headers have to come directly 
after the elf header, or even be present in the first memory region. Producing 
this is harder, but
[here's](https://github.com/user-attachments/files/18518554/a.zip) a perfectly 
valid elf file (it just prints "Hello world when executed), whose program 
headers are in the second memory region. (That said, you might not be always 
able to load this kind of a file from memory because you can't always find the 
program headers from the elf headers -- this is why the dynamic linker also 
stores a pointer to the dynamic section.)

> But I am currently working with some Android folks and we do need to worry 
> about loading ELF headers from the middle of an APK file. I am not sure how 
> the memory regions look for shared libraries that are uncompressed inside of 
> the APK files. If the memory regions are ok, then this patch would work, but 
> if there is one memory region for the entire APK it wouldn't. Though I don't 
> see how this can work if there are not valid memory regions for each shared 
> library within the APK because they need different permissions.

They will have multiple memory regions, exactly as described by PT_LOAD 
segments. Basically, the only difference is in the `offset` argument to the 
`mmap` call that  the loader makes. Everything I said before is still true 
though -- the first of those PT_LOAD segments could still be extremely large.

> Let me know if you think I should try to intelligently resize within 
> ObjectFileELF, or if anything I have said above made you change your mind.

Not really (but thanks for the background/explanation). I still believe the 
best way to achieve what you want is inside ObjectFileELF. Basically, to 
increase the size of the loaded memory if the program headers don't fit into 
the original, just like you described. The reason for that is that here we can 
know what is the size of the program headers we expect, and so we can read just 
that piece of memory, instead of a potentially very large chunk. (Using the 
memory region info to confirm the data from the elf header is still a good idea 
though, so that we avoid that problem with corrupted headers).

I'd also be fine with increasing the initial chunk of memory being loaded to 
say 4kb, so that we don't need to perform this second load in the most common 
case.

https://github.com/llvm/llvm-project/pull/123148
_______________________________________________
lldb-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits

[Lldb-commits] [lldb] [LLDB][NFC] Calculate the region size of an in memory image if size isn't specified (PR #123148)

Reply via email to