Re: [lldb-dev] UnicodeDecodeError for serialize SBValue description

Enrico Granata via lldb-dev Mon, 28 Mar 2016 11:39:18 -0700

This is kind of orthogonal to your problem, but the reason why you are not 
seeing the kind of simplified printing Greg is suggesting, is because your 
std::string doesn’t look like any of the kinds we recognize


Specifically, LLDB data formatters work by matching against type names, and 
once they recognize a typename, then they try to inspect the variable in order 
to grab a summary
In your example, your std::string exposes a layout that we are not handling - 
hence we bail out of the formatter and we fall back to the raw view

If you want pretty printing to work, you’ll need to write a data formatter

There are a few avenues. The obvious easy one is to extend the existing 
std::string formatter to recognize your type’s internal layout.
If one were signing up for more infrastructure work, they could decide to try 
and detect shared library loads and load formatters that match with whatever 
libraries are being loaded.

> On Mar 28, 2016, at 9:47 AM, Greg Clayton via lldb-dev 
> <[email protected]> wrote:
> 
> So you need to be prepared to escape any text that can have special 
> characters. A "std::string" or any container can contain special characters. 
> If you are encoding stuff into JSON, you will either need to escape any 
> special characters, or hex encode the string into ASCII hex bytes. 
> 
> In debuggers we often get bogus data because variables are not initialized, 
> but the compiler tells us that a variable is valid in address range 
> [0x1000-0x2000), but it actually is [0x1200-0x2000). If we read a variable in 
> this case, a std::string might contain bogus data and the bytes might not 
> make sense. So you always have to be prepared for bad data.
> 
> If we look at:
> 
>  store_ = {
>     = {
>      small_ = "www"
>      ml_ = (data_ =
> "��UH\x89�H�}�H\x8bE�]ÐUH\x89�H��H\x89}�H\x8bE�H\x89��~\xb4��\x90��UH\x89�SH\x83�H\x89}�H�u�H�E�H���\x9e���H\x8b\x18H\x8bE�H���O\xb4��H\x89ƿ\b",
> size_ = 0, capacity_ = 1441151880758558720)
>    }
>  }
> }
> 
> We can see the "size_" is zero, and capacity_ is 1441151880758558720 (which 
> is 0x1400000000000000). "data_" seems to be some random pointer. 
> 
> On MacOSX, we have a special formatting code that displays std::string in 
> CPlusPlusLanguage.cpp that gets installed in the LoadLibCxxFormatters() or 
> LoadLibStdcppFormatters() functions with code like:
> 
>    lldb::TypeSummaryImplSP std_string_summary_sp(new 
> CXXFunctionSummaryFormat(stl_summary_flags, 
> lldb_private::formatters::LibcxxStringSummaryProvider, "std::string summary 
> provider"));
>    
> cpp_category_sp->GetTypeSummariesContainer()->Add(ConstString("std::__1::string"),
>  std_string_summary_sp);
> 
> Special flags are set on std::string to say "don't show children of this and 
> just show a summary" So if a std::string contained "hello". So for the 
> following code:
> 
> std::string h ("hello");
> 
> You should just see:
> 
> (lldb) fr var h
> (std::__1::string) h = "hello"
> 
> If you take a look at the normal value in the raw we see:
> 
> (lldb) fr var --raw h
> (std::__1::string) h = {
>  __r_ = {
>    std::__1::__libcpp_compressed_pair_imp<std::__1::basic_string<char, 
> std::__1::char_traits<char>, std::__1::allocator<char> >::__rep, 
> std::__1::allocator<char>, 2> = {
>      __first_ = {
>         = {
>          __l = {
>            __cap_ = 122511465736202
>            __size_ = 0
>            __data_ = 0x0000000000000000
>          }
>          __s = {
>             = {
>              __size_ = '\n'
>              __lx = '\n'
>            }
>            __data_ = {
>              [0] = 'h'
>              [1] = 'e'
>              [2] = 'l'
>              [3] = 'l'
>              [4] = 'o'
>              [5] = '\0'
>              [6] = '\0'
>              [7] = '\0'
>              [8] = '\0'
>              [9] = '\0'
>              [10] = '\0'
>              [11] = '\0'
>              [12] = '\0'
>              [13] = '\0'
>              [14] = '\0'
>              [15] = '\0'
>              [16] = '\0'
>              [17] = '\0'
>              [18] = '\0'
>              [19] = '\0'
>              [20] = '\0'
>              [21] = '\0'
>              [22] = '\0'
>            }
>          }
>          __r = {
>            __words = {
>              [0] = 122511465736202
>              [1] = 0
>              [2] = 0
>            }
>          }
>        }
>      }
>    }
>  }
> }
> 
> So the main question is why are our "std::string" formatters not kicking in 
> for you. That comes down to a typename match, or the format of the string 
> isn't what the formatter is expecting.
> 
> But again, since you std::string can contain anything, you will need to 
> escape any and all text that is encoded into JSON to ensure it doesn't 
> contain anything JSON can't deal with.
> 
>> On Mar 27, 2016, at 9:20 PM, Jeffrey Tan via lldb-dev 
>> <[email protected]> wrote:
>> 
>> Thanks Siva. All the DW_TAG_member related errors seems to go away after 
>> patching with your fix. The current problem is handling the decoding. 
>> 
>> Here is the correct decoding from gdb whic might be useful:
>> (gdb) p corpus
>> $3 = (const std::string &) @0x7fd133cfb888: {
>>  static npos = 18446744073709551615, store_ = {
>>    static kIsLittleEndian = <optimized out>,
>>    static kIsBigEndian = <optimized out>, {
>>      small_ = "www", '\000' <repeats 20 times>, "\024", ml_ = {
>>        data_ = 0x777777 <std::_Any_data::_M_access<void 
>> folly::fibers::Baton::waitFiber<folly::fibers::FirstArgOf<facebook::servicerouter::RequestDispatcherBase<facebook::servicerouter::ThriftDispatcher>::prepareForSelection(facebook::servicerouter::DispatchContext&)::{lambda(folly::fibers::Promise<facebook::servicerouter::RequestDispatcherBase<facebook::servicerouter::ThriftDispatcher>::prepareForSelection(facebook::servicerouter::DispatchContext&)::SelectionResult>)#1},
>>  void>::type::value_type 
>> folly::fibers::await<facebook::servicerouter::RequestDispatcherBase<facebook::servicerouter::ThriftDispatcher>::prepareForSelection(facebook::servicerouter::DispatchContext&)::{lambda(folly::fibers::Promise<facebook::servicerouter::RequestDispatcherBase<facebook::servicerouter::ThriftDispatcher>::prepareForSelection(facebook::servicerouter::DispatchContext&)::SelectionResult>)#1}>(folly::fibers::FirstArgOf&&)::{lambda()#1}>(folly::fibers::FiberManager&,
>>  
>> folly::fibers::FirstArgOf<folly::fibers::FirstArgOf<facebook::servicerouter::RequestDispatcherBase<facebook::servicerouter::ThriftDispatcher>::prepareForSelection(facebook::servicerouter::DispatchContext&)::{lambda(folly::fibers::Promise<facebook::servicerouter::RequestDispatcherBase<facebook::servicerouter::ThriftDispatcher>::prepareForSelection(facebook::servicerouter::DispatchContext&)::SelectionResult>)#1},
>>  void>::type::value_type 
>> folly::fibers::await<facebook::servicerouter::RequestDispatcherBase<facebook::servicerouter::ThriftDispatcher>::prepareForSelection(facebook::servicerouter::DispatchContext&)::{lambda(folly::fibers::Promise<facebook::servicerouter::RequestDispatcherBase<facebook::servicerouter::ThriftDispatcher>::prepareForSelection(facebook::servicerouter::DispatchContext&)::SelectionResult>)#1}>(folly::fibers::FirstArgOf&&)::{lambda()#1},
>>  void>::type::value_type)::{lambda(folly::fibers::Fiber&)#1}*>() const+25> 
>> "\311\303UH\211\345H\211}\370H\213E\370]ÐUH\211\345H\203\354\020H\211}\370H\213E\370H\211\307\350~\264\312\377\220\311\303UH\211\345SH\203\354\030H\211}\350H\211u\340H\213E\340H\211\307\350\236\377\377\377H\213\030H\213E\350H\211\307\350O\264\312\377H\211ƿ\b",
>>  size_ = 0,
>>        capacity_ = 1441151880758558720}}}}
>> 
>> Utf-16 does not seem to decode it, while 'latin-1' does:
>>>>> '\xc9'.decode('utf-16')
>> Traceback (most recent call last):
>>  File "<stdin>", line 1, in <module>
>>  File 
>> "/mnt/gvfs/third-party2/python/55c1fd79d91c77c95932db31a4769919611c12bb/2.7.8/centos6-native/da39a3e/lib/python2.7/encodings/utf_16.py",
>>  line 16, in decode
>>    return codecs.utf_16_decode(input, errors, True)
>> UnicodeDecodeError: 'utf16' codec can't decode byte 0xc9 in position 0: 
>> truncated data
>>>>> '\xc9'.decode('latin-1')
>> u'\xc9'
>> 
>> Instead of guessing what kind of decoding I should use, I would use 
>> 'ensure_ascii=False' to prevent the crash for now.
>> 
>> I tried to reproduce this crash, but it seems that the crash might be 
>> related with some internal stl implementation we are using. I will see if I 
>> can narrow down to a small repro later. 
>> 
>> Thanks
>> Jeffrey
>> 
>> On Sun, Mar 27, 2016 at 2:49 PM, Siva Chandra <[email protected]> wrote:
>> On Sat, Mar 26, 2016 at 11:58 PM, Jeffrey Tan <[email protected]> 
>> wrote:
>>> Btw: after patching with Siva's fix http://reviews.llvm.org/D18008, the
>>> first field 'small_' is fixed, however the second field 'ml_' still emits
>>> garbage:
>>> 
>>> (lldb) fr v corpus
>>> (const string &const) corpus = error: summary string parsing error: {
>>>  store_ = {
>>>     = {
>>>      small_ = "www"
>>>      ml_ = (data_ =
>>> "��UH\x89�H�}�H\x8bE�]ÐUH\x89�H��H\x89}�H\x8bE�H\x89��~\xb4��\x90��UH\x89�SH\x83�H\x89}�H�u�H�E�H���\x9e���H\x8b\x18H\x8bE�H���O\xb4��H\x89ƿ\b",
>>> size_ = 0, capacity_ = 1441151880758558720)
>>>    }
>>>  }
>>> }
>> 
>> Do you still see the DW_TAG_member related error?
>> 
>> A wild (and really wild at that) guess: Is it utf16 data that is being
>> decoded as utf8?
>> 
>> As David Blaikie mentioned on the other thread, it would really help
>> if you provide us with a minimal example to repro this. Atleast, repro
>> instructions.
>> 
>> _______________________________________________
>> lldb-dev mailing list
>> [email protected]
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
> 
> _______________________________________________
> lldb-dev mailing list
> [email protected]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


Thanks,
- Enrico
📩 egranata@.com ☎️ 27683

_______________________________________________
lldb-dev mailing list
[email protected]
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev

Re: [lldb-dev] UnicodeDecodeError for serialize SBValue description

Reply via email to