https://github.com/jasonmolenda created 
https://github.com/llvm/llvm-project/pull/177309

We have many places where an ObjectFile subclass will take the DataExtractor 
representing the entire binary, create a subsection of that in a new 
DataExtractor for processing.  For instance, an object file might have symbol 
table entries with offsets into the string table.  A common code pattern is to 
create a DataExtractor representing the string table, and then pulling out the 
c-strings based on those offsets from the string table DataExtractor.

When code does this, it creates a new DataExtractor, copies the Endianness and 
Wordsize from the original, copies the DataBufferSP from the original, and 
specifies a new start and offset into the DataBuffer.

However, if the binary is actaully stored in a VirtualDataExtractor, this code 
pattern loses the correct virtual-to-physical table translation and will not 
work correctly.  This new method simplifies this common pattern, and correctly 
takes a subset of a VirtualDataExtractor.

The current implementation only allows a subset of a VirtualDataExtractor that 
is contained within a single virtual entry (LookupTable entry) and returns a 
DataExtractor with the corret offsets calculated from the LookupTable.  If we 
need to a VirtualDataExtractor to create a Subset DataExtractor representing 
multiple separate virtual ranges of data, we'll need to copy over the 
LookupTable entries that cover all the bytes, and update them to be relative to 
the new VirtualDataExtractor.  It's a bit of work, and it's not needed right 
now, so I'm not tackling that.

I am working on a larger PR which needs this new method.  This PR contains a 
unit test that uses it.

rdar://148939795

>From 553606915b982eaf029a1200a9c74e734e69add8 Mon Sep 17 00:00:00 2001
From: Jason Molenda <[email protected]>
Date: Wed, 21 Jan 2026 21:46:37 -0800
Subject: [PATCH] [lldb] Add a GetSubsetExtractorSP method to DataExtractor

We have many places where an ObjectFile subclass will take the
DataExtractor representing the entire binary, create a subsection
of that in a new DataExtractor for processing.  For instance, an
object file might have symbol table entries with offsets into the
string table.  A common code pattern is to create a DataExtractor
representing the string table, and then pulling out the c-strings
based on those offsets from the string table DataExtractor.

When code does this, it creates a new DataExtractor, copies the
Endianness and Wordsize from the original, copies the DataBufferSP
from the original, and specifies a new start and offset into the
DataBuffer.

However, if the binary is actaully stored in a VirtualDataExtractor,
this code pattern loses the correct virtual-to-physical table
translation and will not work correctly.  This new method simplifies
this common pattern, and correctly takes a subset of a
VirtualDataExtractor.

The current implementation only allows a subset of a VirtualDataExtractor
that is contained within a single virtual entry (LookupTable entry)
and returns a DataExtractor with the corret offsets calculated from
the LookupTable.  If we need to a VirtualDataExtractor to
create a Subset DataExtractor representing multiple separate virtual
ranges of data, we'll need to copy over the LookupTable entries that
cover all the bytes, and update them to be relative to the new
VirtualDataExtractor.  It's a bit of work, and it's not needed right
now, so I'm not tackling that.

rdar://148939795
---
 lldb/include/lldb/Utility/DataExtractor.h     | 14 ++++++++
 .../lldb/Utility/VirtualDataExtractor.h       |  6 ++++
 lldb/source/Utility/DataExtractor.cpp         |  9 ++++++
 lldb/source/Utility/VirtualDataExtractor.cpp  | 32 +++++++++++++++++++
 4 files changed, 61 insertions(+)

diff --git a/lldb/include/lldb/Utility/DataExtractor.h 
b/lldb/include/lldb/Utility/DataExtractor.h
index f8473aedb4f7a..7a9a453c551be 100644
--- a/lldb/include/lldb/Utility/DataExtractor.h
+++ b/lldb/include/lldb/Utility/DataExtractor.h
@@ -818,6 +818,20 @@ class DataExtractor {
   ///     The extracted unsigned integer value.
   uint64_t GetULEB128(lldb::offset_t *offset_ptr) const;
 
+  /// Return a new DataExtractor which represents a subset of an existing
+  /// data extractor's bytes, copying all other fields from the existing
+  /// data extractor.
+  ///
+  /// \param[in] offset
+  ///     The starting byte offset into the shared data buffer.
+  /// \param[in] length
+  ///     The length of bytes that the new extractor can operate on.
+  ///
+  /// \return
+  ///     A shared pointer to a new DataExtractor.
+  virtual lldb::DataExtractorSP GetSubsetExtractorSP(lldb::offset_t offset,
+                                                     lldb::offset_t length);
+
   lldb::DataBufferSP &GetSharedDataBuffer() { return m_data_sp; }
 
   bool HasData() { return m_start && m_end && m_end - m_start > 0; }
diff --git a/lldb/include/lldb/Utility/VirtualDataExtractor.h 
b/lldb/include/lldb/Utility/VirtualDataExtractor.h
index e430dd8628b5f..5f5d3905a67e7 100644
--- a/lldb/include/lldb/Utility/VirtualDataExtractor.h
+++ b/lldb/include/lldb/Utility/VirtualDataExtractor.h
@@ -43,12 +43,18 @@ class VirtualDataExtractor : public DataExtractor {
                        lldb::ByteOrder byte_order, uint32_t addr_size,
                        LookupTable lookup_table);
 
+  VirtualDataExtractor(const lldb::DataBufferSP &data_sp,
+                       LookupTable lookup_table);
+
   const void *GetData(lldb::offset_t *offset_ptr,
                       lldb::offset_t length) const override;
 
   const uint8_t *PeekData(lldb::offset_t offset,
                           lldb::offset_t length) const override;
 
+  lldb::DataExtractorSP GetSubsetExtractorSP(lldb::offset_t offset,
+                                             lldb::offset_t length) override;
+
   /// Unchecked overrides
   /// @{
   uint8_t GetU8_unchecked(lldb::offset_t *offset_ptr) const override;
diff --git a/lldb/source/Utility/DataExtractor.cpp 
b/lldb/source/Utility/DataExtractor.cpp
index 9acad470ded2f..a55bd5040036e 100644
--- a/lldb/source/Utility/DataExtractor.cpp
+++ b/lldb/source/Utility/DataExtractor.cpp
@@ -1050,3 +1050,12 @@ void 
DataExtractor::Checksum(llvm::SmallVectorImpl<uint8_t> &dest,
   dest.clear();
   dest.append(result.begin(), result.end());
 }
+
+DataExtractorSP DataExtractor::GetSubsetExtractorSP(offset_t offset,
+                                                    offset_t length) {
+  DataExtractorSP new_sp = std::make_shared<DataExtractor>(
+      GetSharedDataBuffer(), GetByteOrder(), GetAddressByteSize());
+  new_sp->SetData(GetSharedDataBuffer(), GetSharedDataOffset() + offset,
+                  length);
+  return new_sp;
+}
diff --git a/lldb/source/Utility/VirtualDataExtractor.cpp 
b/lldb/source/Utility/VirtualDataExtractor.cpp
index a23e43b383d25..dd99399b9a577 100644
--- a/lldb/source/Utility/VirtualDataExtractor.cpp
+++ b/lldb/source/Utility/VirtualDataExtractor.cpp
@@ -31,6 +31,12 @@ VirtualDataExtractor::VirtualDataExtractor(const 
DataBufferSP &data_sp,
   m_lookup_table.Sort();
 }
 
+VirtualDataExtractor::VirtualDataExtractor(const DataBufferSP &data_sp,
+                                           LookupTable lookup_table)
+    : DataExtractor(data_sp), m_lookup_table(std::move(lookup_table)) {
+  m_lookup_table.Sort();
+}
+
 const VirtualDataExtractor::LookupTable::Entry *
 VirtualDataExtractor::FindEntry(offset_t virtual_addr) const {
   // Use RangeDataVector's binary search instead of linear search.
@@ -137,3 +143,29 @@ uint64_t VirtualDataExtractor::GetU64_unchecked(offset_t 
*offset_ptr) const {
   *offset_ptr += 8;
   return result;
 }
+
+DataExtractorSP
+VirtualDataExtractor::GetSubsetExtractorSP(offset_t virtual_offset,
+                                           offset_t virtual_length) {
+  const LookupTable::Entry *entry = FindEntry(virtual_offset);
+  assert(entry && "Unchecked methods require valid virtual address");
+
+  // Entry::data is the offset into the DataBuffer's actual start/end range
+  // Entry::base is the virtual address at the start of this region of data
+  offset_t offset_into_entry_range = virtual_offset - entry->base;
+  assert(
+      offset_into_entry_range + virtual_length <= entry->size &&
+      "VirtualDataExtractor subset may not span multiple LookupTable entries");
+
+  // We could support a Subset VirtualDataExtractor which covered
+  // multiple LookupTable virtual entries, but we'd need to mutate
+  // all of the LookupTable entries that were properly included in
+  // the Subset, a bit tricky.  So we won't implement that until it's
+  // needed.
+
+  offset_t physical_start = entry->data + offset_into_entry_range;
+  std::shared_ptr<DataExtractor> new_sp = std::make_shared<DataExtractor>(
+      GetSharedDataBuffer(), GetByteOrder(), GetAddressByteSize());
+  new_sp->SetData(GetSharedDataBuffer(), physical_start, virtual_length);
+  return new_sp;
+}

_______________________________________________
lldb-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits

Reply via email to