Re: [lldb-dev] UnicodeDecodeError for serialize SBValue description

Enrico Granata via lldb-dev Wed, 06 Apr 2016 11:49:47 -0700

> On Apr 5, 2016, at 2:42 PM, Jeffrey Tan <jeffrey.fu...@gmail.com> wrote:
> 
> Hi Enrico,
> 
> Any suggestion/example how to add a data formatter for our own STL string? 
> From the output below I can see we are using our own "fbstring_core" which I 
> assume I need to write a type summary for this type:
> 
> frame variable corpus -T
> (const string &const) corpus = error: summary string parsing error: {
>   (std::fbstring_core<char>) store_ = {
>     (std::fbstring_core<char>::(anonymous union))  = {
>       (char [24]) small_ = "www"
>       (std::fbstring_core<char>::MediumLarge) ml_ = {
>         (char *) data_ = 0x0000000000777777 
> "H\x89U\xa8H\x89M\xa0L\x89E\x98H\x8bE\xa8H\x89��_U��D\x88e�H\x8bE\xa0H\x89��]U��H\x89�H\x8dE�H\x89�H\x89�����L\x8dm�H\x8bE\x98H\x89��IU��\x88]�L\x8be\xb0L\x89��
>         (std::size_t) size_ = 0
>         (std::size_t) capacity_ = 1441151880758558720
>       }
>     }
>   }
> }
>


Admittedly, this is going to be a little vague since I haven’t really seen your 
code and I am only working off of one sample

There’s going to be two parts to getting this to work:

Part 1 - Formatting fbstring_core

At a glance, an fbstring_core<char> can be backed by two representations. A 
“small” representation (a char array), and a “medium/large" representation (a 
char* + a size)
I assume that the way you tell one from the other is

if (size == 0) small
else medium-large

If my assumption is not correct, you’ll need to discover what the correct 
discriminator logic is - the class has to know, and so do you :-)

Armed with that knowledge, look in lldb 
source/Plugins/Language/CPlusPlus/Formatters/LibCxx.cpp
There’s a bunch of code that deals with formatting llvm’s libc++ std::string - 
which follows a very similar logic to your class

ExtractLibcxxStringInfo() is the function that handles discovering which layout 
the string uses - where the data lives - and how much data there is

Once you have told yourself how much data there is (the size) and where it 
lives (array or pointer), LibcxxStringSummaryProvider() has the easy task - it 
sets up a StringPrinter, tells it how much data to print, where to get it from, 
and then delegates the StringPrinter to do the grunt work
StringPrinter is a nifty little tool - it can handle generating summaries for 
different kinds of strings (UTF8? UTF16? we got it - is a \0 a terminator? what 
quote character would you like? …) - you point it at some data, set up a few 
options, and it will generate a printable representation for you - if your 
string type is doing anything out of the ordinary, let’s talk - I am definitely 
open to extending StringPrinter to handle even more magic

Part 2 - Teaching std::string that it can be backed by an fbstring_core

At the end of part 1, you’ll probably end up with a 
FBStringCoreSummaryProvider() - now you need to teach LLDB about it
The obvious thing you could do would be to go in 
CPlusPlusLanguage::GetFormatters() add a LoadFBStringFormatter(g_category) to 
it - and then imitate - say - LoadLibCxxFormatters()

    AddCXXSummary(cpp_category_sp, 
lldb_private::formatters::FBStringCoreSummaryProvider, “fbstringcore summary 
provider", ConstString(“std::fbstring_core<.+>"), stl_summary_flags, true);

That will work - but what you would see is:

> (const string &const) corpus = error: summary string parsing error: {
>   (std::fbstring_core<char>) store_ = “www"

You wanna do

(lldb) log enable lldb formatters
(lldb) frame variable -T corpus

It will list one or more typenames - the most specific one is the one you like 
(e.g. for libc++ we get std::__1::string - this is how we tell ourselves this 
is the std::string from libc++)
Once you find that typename, you’ll make a new formatter - 
FBStringSummaryProvider() - and register that formatter with that very specific 
typename

All that FBStringSummaryProvider() has to do is get the “store_” member 
(ValueObject::GetChildMemberWithName() is your friend) - and pass it down to 
FBStringCoreSummaryProvider()


I understand this may seem a little convoluted and arcane at first - but feel 
free to ask more questions, and I’ll try to help out!

> Thanks.
> Jeffrey
> 
> On Mon, Mar 28, 2016 at 11:38 AM, Enrico Granata <egran...@apple.com 
> <mailto:egran...@apple.com>> wrote:
> This is kind of orthogonal to your problem, but the reason why you are not 
> seeing the kind of simplified printing Greg is suggesting, is because your 
> std::string doesn’t look like any of the kinds we recognize
> 
> Specifically, LLDB data formatters work by matching against type names, and 
> once they recognize a typename, then they try to inspect the variable in 
> order to grab a summary
> In your example, your std::string exposes a layout that we are not handling - 
> hence we bail out of the formatter and we fall back to the raw view
> 
> If you want pretty printing to work, you’ll need to write a data formatter
> 
> There are a few avenues. The obvious easy one is to extend the existing 
> std::string formatter to recognize your type’s internal layout.
> If one were signing up for more infrastructure work, they could decide to try 
> and detect shared library loads and load formatters that match with whatever 
> libraries are being loaded.
> 
>> On Mar 28, 2016, at 9:47 AM, Greg Clayton via lldb-dev 
>> <lldb-dev@lists.llvm.org <mailto:lldb-dev@lists.llvm.org>> wrote:
>> 
>> So you need to be prepared to escape any text that can have special 
>> characters. A "std::string" or any container can contain special characters. 
>> If you are encoding stuff into JSON, you will either need to escape any 
>> special characters, or hex encode the string into ASCII hex bytes. 
>> 
>> In debuggers we often get bogus data because variables are not initialized, 
>> but the compiler tells us that a variable is valid in address range 
>> [0x1000-0x2000), but it actually is [0x1200-0x2000). If we read a variable 
>> in this case, a std::string might contain bogus data and the bytes might not 
>> make sense. So you always have to be prepared for bad data.
>> 
>> If we look at:
>> 
>>  store_ = {
>>     = {
>>      small_ = "www"
>>      ml_ = (data_ =
>> "��UH\x89�H�}�H\x8bE�]ÐUH\x89�H��H\x89}�H\x8bE�H\x89��~\xb4��\x90��UH\x89�SH\x83�H\x89}�H�u�H�E�H���\x9e���H\x8b\x18H\x8bE�H���O\xb4��H\x89ƿ\b",
>> size_ = 0, capacity_ = 1441151880758558720)
>>    }
>>  }
>> }
>> 
>> We can see the "size_" is zero, and capacity_ is 1441151880758558720 (which 
>> is 0x1400000000000000). "data_" seems to be some random pointer. 
>> 
>> On MacOSX, we have a special formatting code that displays std::string in 
>> CPlusPlusLanguage.cpp that gets installed in the LoadLibCxxFormatters() or 
>> LoadLibStdcppFormatters() functions with code like:
>> 
>>    lldb::TypeSummaryImplSP std_string_summary_sp(new 
>> CXXFunctionSummaryFormat(stl_summary_flags, 
>> lldb_private::formatters::LibcxxStringSummaryProvider, "std::string summary 
>> provider"));
>>    
>> cpp_category_sp->GetTypeSummariesContainer()->Add(ConstString("std::__1::string"),
>>  std_string_summary_sp);
>> 
>> Special flags are set on std::string to say "don't show children of this and 
>> just show a summary" So if a std::string contained "hello". So for the 
>> following code:
>> 
>> std::string h ("hello");
>> 
>> You should just see:
>> 
>> (lldb) fr var h
>> (std::__1::string) h = "hello"
>> 
>> If you take a look at the normal value in the raw we see:
>> 
>> (lldb) fr var --raw h
>> (std::__1::string) h = {
>>  __r_ = {
>>    std::__1::__libcpp_compressed_pair_imp<std::__1::basic_string<char, 
>> std::__1::char_traits<char>, std::__1::allocator<char> >::__rep, 
>> std::__1::allocator<char>, 2> = {
>>      __first_ = {
>>         = {
>>          __l = {
>>            __cap_ = 122511465736202
>>            __size_ = 0
>>            __data_ = 0x0000000000000000
>>          }
>>          __s = {
>>             = {
>>              __size_ = '\n'
>>              __lx = '\n'
>>            }
>>            __data_ = {
>>              [0] = 'h'
>>              [1] = 'e'
>>              [2] = 'l'
>>              [3] = 'l'
>>              [4] = 'o'
>>              [5] = '\0'
>>              [6] = '\0'
>>              [7] = '\0'
>>              [8] = '\0'
>>              [9] = '\0'
>>              [10] = '\0'
>>              [11] = '\0'
>>              [12] = '\0'
>>              [13] = '\0'
>>              [14] = '\0'
>>              [15] = '\0'
>>              [16] = '\0'
>>              [17] = '\0'
>>              [18] = '\0'
>>              [19] = '\0'
>>              [20] = '\0'
>>              [21] = '\0'
>>              [22] = '\0'
>>            }
>>          }
>>          __r = {
>>            __words = {
>>              [0] = 122511465736202
>>              [1] = 0
>>              [2] = 0
>>            }
>>          }
>>        }
>>      }
>>    }
>>  }
>> }
>> 
>> So the main question is why are our "std::string" formatters not kicking in 
>> for you. That comes down to a typename match, or the format of the string 
>> isn't what the formatter is expecting.
>> 
>> But again, since you std::string can contain anything, you will need to 
>> escape any and all text that is encoded into JSON to ensure it doesn't 
>> contain anything JSON can't deal with.
>> 
>>> On Mar 27, 2016, at 9:20 PM, Jeffrey Tan via lldb-dev 
>>> <lldb-dev@lists.llvm.org <mailto:lldb-dev@lists.llvm.org>> wrote:
>>> 
>>> Thanks Siva. All the DW_TAG_member related errors seems to go away after 
>>> patching with your fix. The current problem is handling the decoding. 
>>> 
>>> Here is the correct decoding from gdb whic might be useful:
>>> (gdb) p corpus
>>> $3 = (const std::string &) @0x7fd133cfb888: {
>>>  static npos = 18446744073709551615, store_ = {
>>>    static kIsLittleEndian = <optimized out>,
>>>    static kIsBigEndian = <optimized out>, {
>>>      small_ = "www", '\000' <repeats 20 times>, "\024", ml_ = {
>>>        data_ = 0x777777 <std::_Any_data::_M_access<void 
>>> folly::fibers::Baton::waitFiber<folly::fibers::FirstArgOf<facebook::servicerouter::RequestDispatcherBase<facebook::servicerouter::ThriftDispatcher>::prepareForSelection(facebook::servicerouter::DispatchContext&)::{lambda(folly::fibers::Promise<facebook::servicerouter::RequestDispatcherBase<facebook::servicerouter::ThriftDispatcher>::prepareForSelection(facebook::servicerouter::DispatchContext&)::SelectionResult>)#1},
>>>  void>::type::value_type 
>>> folly::fibers::await<facebook::servicerouter::RequestDispatcherBase<facebook::servicerouter::ThriftDispatcher>::prepareForSelection(facebook::servicerouter::DispatchContext&)::{lambda(folly::fibers::Promise<facebook::servicerouter::RequestDispatcherBase<facebook::servicerouter::ThriftDispatcher>::prepareForSelection(facebook::servicerouter::DispatchContext&)::SelectionResult>)#1}>(folly::fibers::FirstArgOf&&)::{lambda()#1}>(folly::fibers::FiberManager&,
>>>  
>>> folly::fibers::FirstArgOf<folly::fibers::FirstArgOf<facebook::servicerouter::RequestDispatcherBase<facebook::servicerouter::ThriftDispatcher>::prepareForSelection(facebook::servicerouter::DispatchContext&)::{lambda(folly::fibers::Promise<facebook::servicerouter::RequestDispatcherBase<facebook::servicerouter::ThriftDispatcher>::prepareForSelection(facebook::servicerouter::DispatchContext&)::SelectionResult>)#1},
>>>  void>::type::value_type 
>>> folly::fibers::await<facebook::servicerouter::RequestDispatcherBase<facebook::servicerouter::ThriftDispatcher>::prepareForSelection(facebook::servicerouter::DispatchContext&)::{lambda(folly::fibers::Promise<facebook::servicerouter::RequestDispatcherBase<facebook::servicerouter::ThriftDispatcher>::prepareForSelection(facebook::servicerouter::DispatchContext&)::SelectionResult>)#1}>(folly::fibers::FirstArgOf&&)::{lambda()#1},
>>>  void>::type::value_type)::{lambda(folly::fibers::Fiber&)#1}*>() const+25> 
>>> "\311\303UH\211\345H\211}\370H\213E\370]ÐUH\211\345H\203\354\020H\211}\370H\213E\370H\211\307\350~\264\312\377\220\311\303UH\211\345SH\203\354\030H\211}\350H\211u\340H\213E\340H\211\307\350\236\377\377\377H\213\030H\213E\350H\211\307\350O\264\312\377H\211ƿ\b",
>>>  size_ = 0,
>>>        capacity_ = 1441151880758558720}}}}
>>> 
>>> Utf-16 does not seem to decode it, while 'latin-1' does:
>>>>>> '\xc9'.decode('utf-16')
>>> Traceback (most recent call last):
>>>  File "<stdin>", line 1, in <module>
>>>  File 
>>> "/mnt/gvfs/third-party2/python/55c1fd79d91c77c95932db31a4769919611c12bb/2.7.8/centos6-native/da39a3e/lib/python2.7/encodings/utf_16.py",
>>>  line 16, in decode
>>>    return codecs.utf_16_decode(input, errors, True)
>>> UnicodeDecodeError: 'utf16' codec can't decode byte 0xc9 in position 0: 
>>> truncated data
>>>>>> '\xc9'.decode('latin-1')
>>> u'\xc9'
>>> 
>>> Instead of guessing what kind of decoding I should use, I would use 
>>> 'ensure_ascii=False' to prevent the crash for now.
>>> 
>>> I tried to reproduce this crash, but it seems that the crash might be 
>>> related with some internal stl implementation we are using. I will see if I 
>>> can narrow down to a small repro later. 
>>> 
>>> Thanks
>>> Jeffrey
>>> 
>>> On Sun, Mar 27, 2016 at 2:49 PM, Siva Chandra <sivachan...@gmail.com 
>>> <mailto:sivachan...@gmail.com>> wrote:
>>> On Sat, Mar 26, 2016 at 11:58 PM, Jeffrey Tan <jeffrey.fu...@gmail.com 
>>> <mailto:jeffrey.fu...@gmail.com>> wrote:
>>>> Btw: after patching with Siva's fix http://reviews.llvm.org/D18008 
>>>> <http://reviews.llvm.org/D18008>, the
>>>> first field 'small_' is fixed, however the second field 'ml_' still emits
>>>> garbage:
>>>> 
>>>> (lldb) fr v corpus
>>>> (const string &const) corpus = error: summary string parsing error: {
>>>>  store_ = {
>>>>     = {
>>>>      small_ = "www"
>>>>      ml_ = (data_ =
>>>> "��UH\x89�H�}�H\x8bE�]ÐUH\x89�H��H\x89}�H\x8bE�H\x89��~\xb4��\x90��UH\x89�SH\x83�H\x89}�H�u�H�E�H���\x9e���H\x8b\x18H\x8bE�H���O\xb4��H\x89ƿ\b",
>>>> size_ = 0, capacity_ = 1441151880758558720)
>>>>    }
>>>>  }
>>>> }
>>> 
>>> Do you still see the DW_TAG_member related error?
>>> 
>>> A wild (and really wild at that) guess: Is it utf16 data that is being
>>> decoded as utf8?
>>> 
>>> As David Blaikie mentioned on the other thread, it would really help
>>> if you provide us with a minimal example to repro this. Atleast, repro
>>> instructions.
>>> 
>>> _______________________________________________
>>> lldb-dev mailing list
>>> lldb-dev@lists.llvm.org <mailto:lldb-dev@lists.llvm.org>
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev 
>>> <http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev>
>> 
>> _______________________________________________
>> lldb-dev mailing list
>> lldb-dev@lists.llvm.org <mailto:lldb-dev@lists.llvm.org>
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev 
>> <http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev>
> 
> 
> Thanks,
> - Enrico
> 📩 egranata@.com ☎️ 27683
> 
> 


Thanks,
- Enrico
📩 egranata@.com ☎️ 27683

_______________________________________________
lldb-dev mailing list
lldb-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev

Re: [lldb-dev] UnicodeDecodeError for serialize SBValue description

Reply via email to