Re: [lldb-dev] UnicodeDecodeError for serialize SBValue description

Enrico Granata via lldb-dev Thu, 07 Apr 2016 10:30:12 -0700

> On Apr 7, 2016, at 9:51 AM, Jim Ingham <jing...@apple.com> wrote:
> 
> I don't think Enrico was suggesting that we maintain a bunch of third party 
> data formatters in the lldb source base.


That depends - if this std::string implementation is part of a publicly 
available STL implementation, it might make sense for us to “know about it” out 
of the box in the same way we know about libstdc++ and libc++
If it is an internal-only string class, then, yes, I would definitely not 
suggest putting this inside the LLDB core

> He was giving C++ examples (using the lldb_private API's) because the STL 
> formatters are in C++, so that's what he had on hand to demonstrate the kinds 
> of algorithms you would use to dig into these complex structures.  For the 
> most part the lldb_private API's used in Enrico's examples are mirrored in 
> the SB API's pretty directly, so this isn't a terrible source for examples.
> 
> Note, it used to be possible to write C++ based data formatters, build them 
> in a shared library and load them with the "plugin load" command.  These have 
> the advantage of working on systems that don't support Python.  Not sure what 
> the state of that is these days, however.

It might or might not work. If it didn’t work and somebody wanted to fix that, 
I suspect we would gladly accept their patches.

>  But even if you were going to write C++ formatters you'd be better off using 
> the SB API's not the lldb_private API's since then your plugins would have a 
> longer useful life-cycle.
> 
> Jim
> 
> 
>> On Apr 7, 2016, at 2:45 AM, Tamas Berghammer via lldb-dev 
>> <lldb-dev@lists.llvm.org> wrote:
>> 
>> LLDB supports adding data formatters without modifying the source code and I 
>> would strongly prefer to go that way as we don't want each user of LLDB to 
>> start adding data formatters to their own custom types. We have a pretty 
>> detailed (but possible a bit outdated) description about how they work and 
>> how you can add a new one here: http://lldb.llvm.org/varformats.html
>> 
>> Enrico: Is there any reason you suggested the data formatters written inside 
>> LLDB over the python based ones?
>> 
>> On Thu, Apr 7, 2016 at 3:31 AM Jeffrey Tan via lldb-dev 
>> <lldb-dev@lists.llvm.org> wrote:
>> Thanks Enrico. This is very detailed! I will take a look. 
>> Btw: originally, I was hoping that data formatter can be added without 
>> changing the source code. Like giving a xml/json format file telling lldb 
>> the memory layout/structure of the data structure, lldb can parse the 
>> xml/json and deduce the formatting. This is approach used by data visualizer 
>> in VS debugger: https://msdn.microsoft.com/en-us/library/jj620914.aspx
>> This will make adding data formatter more extensible/flexible. Any reason we 
>> did not take this approach? 
>> 
>> Jeffrey
>> 
>> On Wed, Apr 6, 2016 at 11:49 AM, Enrico Granata <egran...@apple.com> wrote:
>> 
>>> On Apr 5, 2016, at 2:42 PM, Jeffrey Tan <jeffrey.fu...@gmail.com> wrote:
>>> 
>>> Hi Enrico,
>>> 
>>> Any suggestion/example how to add a data formatter for our own STL string? 
>>> From the output below I can see we are using our own "fbstring_core" which 
>>> I assume I need to write a type summary for this type:
>>> 
>>> frame variable corpus -T
>>> (const string &const) corpus = error: summary string parsing error: {
>>>  (std::fbstring_core<char>) store_ = {
>>>    (std::fbstring_core<char>::(anonymous union))  = {
>>>      (char [24]) small_ = "www"
>>>      (std::fbstring_core<char>::MediumLarge) ml_ = {
>>>        (char *) data_ = 0x0000000000777777 
>>> "H\x89U\xa8H\x89M\xa0L\x89E\x98H\x8bE\xa8H\x89��_U��D\x88e�H\x8bE\xa0H\x89��]U��H\x89�H\x8dE�H\x89�H\x89���
>>>  ��L\x8dm�H\x8bE\x98H\x89��IU��\x88]�L\x8be\xb0L\x89��
>>>        (std::size_t) size_ = 0
>>>        (std::size_t) capacity_ = 1441151880758558720
>>>      }
>>>    }
>>>  }
>>> }
>>> 
>> 
>> Admittedly, this is going to be a little vague since I haven’t really seen 
>> your code and I am only working off of one sample
>> 
>> There’s going to be two parts to getting this to work:
>> 
>> Part 1 - Formatting fbstring_core
>> 
>> At a glance, an fbstring_core<char> can be backed by two representations. A 
>> “small” representation (a char array), and a “medium/large" representation 
>> (a char* + a size)
>> I assume that the way you tell one from the other is
>> 
>> if (size == 0) small
>> else medium-large
>> 
>> If my assumption is not correct, you’ll need to discover what the correct 
>> discriminator logic is - the class has to know, and so do you :-)
>> 
>> Armed with that knowledge, look in lldb 
>> source/Plugins/Language/CPlusPlus/Formatters/LibCxx.cpp
>> There’s a bunch of code that deals with formatting llvm’s libc++ std::string 
>> - which follows a very similar logic to your class
>> 
>> ExtractLibcxxStringInfo() is the function that handles discovering which 
>> layout the string uses - where the data lives - and how much data there is
>> 
>> Once you have told yourself how much data there is (the size) and where it 
>> lives (array or pointer), LibcxxStringSummaryProvider() has the easy task - 
>> it sets up a StringPrinter, tells it how much data to print, where to get it 
>> from, and then delegates the StringPrinter to do the grunt work
>> StringPrinter is a nifty little tool - it can handle generating summaries 
>> for different kinds of strings (UTF8? UTF16? we got it - is a \0 a 
>> terminator? what quote character would you like? …) - you point it at some 
>> data, set up a few options, and it will generate a printable representation 
>> for you - if your string type is doing anything out of the ordinary, let’s 
>> talk - I am definitely open to extending StringPrinter to handle even more 
>> magic
>> 
>> Part 2 - Teaching std::string that it can be backed by an fbstring_core
>> 
>> At the end of part 1, you’ll probably end up with a 
>> FBStringCoreSummaryProvider() - now you need to teach LLDB about it
>> The obvious thing you could do would be to go in 
>> CPlusPlusLanguage::GetFormatters() add a LoadFBStringFormatter(g_category) 
>> to it - and then imitate - say - LoadLibCxxFormatters()
>> 
>>    AddCXXSummary(cpp_category_sp, 
>> lldb_private::formatters::FBStringCoreSummaryProvider, “fbstringcore summary 
>> provider", ConstString(“std::fbstring_core<.+>"), stl_summary_flags, true);
>> 
>> That will work - but what you would see is:
>> 
>>> (const string &const) corpus = error: summary string parsing error: {
>>>  (std::fbstring_core<char>) store_ = “www"
>> 
>> You wanna do
>> 
>> (lldb) log enable lldb formatters
>> (lldb) frame variable -T corpus
>> 
>> It will list one or more typenames - the most specific one is the one you 
>> like (e.g. for libc++ we get std::__1::string - this is how we tell 
>> ourselves this is the std::string from libc++)
>> Once you find that typename, you’ll make a new formatter - 
>> FBStringSummaryProvider() - and register that formatter with that very 
>> specific typename
>> 
>> All that FBStringSummaryProvider() has to do is get the “store_” member 
>> (ValueObject::GetChildMemberWithName() is your friend) - and pass it down to 
>> FBStringCoreSummaryProvider()
>> 
>> 
>> I understand this may seem a little convoluted and arcane at first - but 
>> feel free to ask more questions, and I’ll try to help out!
>> 
>>> Thanks.
>>> Jeffrey
>>> 
>>> On Mon, Mar 28, 2016 at 11:38 AM, Enrico Granata <egran...@apple.com> wrote:
>>> This is kind of orthogonal to your problem, but the reason why you are not 
>>> seeing the kind of simplified printing Greg is suggesting, is because your 
>>> std::string doesn’t look like any of the kinds we recognize
>>> 
>>> Specifically, LLDB data formatters work by matching against type names, and 
>>> once they recognize a typename, then they try to inspect the variable in 
>>> order to grab a summary
>>> In your example, your std::string exposes a layout that we are not handling 
>>> - hence we bail out of the formatter and we fall back to the raw view
>>> 
>>> If you want pretty printing to work, you’ll need to write a data formatter
>>> 
>>> There are a few avenues. The obvious easy one is to extend the existing 
>>> std::string formatter to recognize your type’s internal layout.
>>> If one were signing up for more infrastructure work, they could decide to 
>>> try and detect shared library loads and load formatters that match with 
>>> whatever libraries are being loaded.
>>> 
>>>> On Mar 28, 2016, at 9:47 AM, Greg Clayton via lldb-dev 
>>>> <lldb-dev@lists.llvm.org> wrote:
>>>> 
>>>> So you need to be prepared to escape any text that can have special 
>>>> characters. A "std::string" or any container can contain special 
>>>> characters. If you are encoding stuff into JSON, you will either need to 
>>>> escape any special characters, or hex encode the string into ASCII hex 
>>>> bytes. 
>>>> 
>>>> In debuggers we often get bogus data because variables are not 
>>>> initialized, but the compiler tells us that a variable is valid in address 
>>>> range [0x1000-0x2000), but it actually is [0x1200-0x2000). If we read a 
>>>> variable in this case, a std::string might contain bogus data and the 
>>>> bytes might not make sense. So you always have to be prepared for bad data.
>>>> 
>>>> If we look at:
>>>> 
>>>> store_ = {
>>>>    = {
>>>>     small_ = "www"
>>>>     ml_ = (data_ =
>>>> "��UH\x89�H�}�H\x8bE�]ÐUH\x89�H��H\x89}�H\x8bE�H\x89��~\xb4��\x90��UH\x89�SH\x83�H\x89}�H�u�H�E�H���\x9e���H\x8b\x18H\x8bE�H���O\xb4��H\x89ƿ\b",
>>>> size_ = 0, capacity_ = 1441151880758558720)
>>>>   }
>>>> }
>>>> }
>>>> 
>>>> We can see the "size_" is zero, and capacity_ is 1441151880758558720 
>>>> (which is 0x1400000000000000). "data_" seems to be some random pointer. 
>>>> 
>>>> On MacOSX, we have a special formatting code that displays std::string in 
>>>> CPlusPlusLanguage.cpp that gets installed in the LoadLibCxxFormatters() or 
>>>> LoadLibStdcppFormatters() functions with code like:
>>>> 
>>>>   lldb::TypeSummaryImplSP std_string_summary_sp(new 
>>>> CXXFunctionSummaryFormat(stl_summary_flags, 
>>>> lldb_private::formatters::LibcxxStringSummaryProvider, "std::string 
>>>> summary provider"));
>>>>   
>>>> cpp_category_sp->GetTypeSummariesContainer()->Add(ConstString("std::__1::string"),
>>>>  std_string_summary_sp);
>>>> 
>>>> Special flags are set on std::string to say "don't show children of this 
>>>> and just show a summary" So if a std::string contained "hello". So for the 
>>>> following code:
>>>> 
>>>> std::string h ("hello");
>>>> 
>>>> You should just see:
>>>> 
>>>> (lldb) fr var h
>>>> (std::__1::string) h = "hello"
>>>> 
>>>> If you take a look at the normal value in the raw we see:
>>>> 
>>>> (lldb) fr var --raw h
>>>> (std::__1::string) h = {
>>>> __r_ = {
>>>>   std::__1::__libcpp_compressed_pair_imp<std::__1::basic_string<char, 
>>>> std::__1::char_traits<char>, std::__1::allocator<char> >::__rep, 
>>>> std::__1::allocator<char>, 2> = {
>>>>     __first_ = {
>>>>        = {
>>>>         __l = {
>>>>           __cap_ = 122511465736202
>>>>           __size_ = 0
>>>>           __data_ = 0x0000000000000000
>>>>         }
>>>>         __s = {
>>>>            = {
>>>>             __size_ = '\n'
>>>>             __lx = '\n'
>>>>           }
>>>>           __data_ = {
>>>>             [0] = 'h'
>>>>             [1] = 'e'
>>>>             [2] = 'l'
>>>>             [3] = 'l'
>>>>             [4] = 'o'
>>>>             [5] = '\0'
>>>>             [6] = '\0'
>>>>             [7] = '\0'
>>>>             [8] = '\0'
>>>>             [9] = '\0'
>>>>             [10] = '\0'
>>>>             [11] = '\0'
>>>>             [12] = '\0'
>>>>             [13] = '\0'
>>>>             [14] = '\0'
>>>>             [15] = '\0'
>>>>             [16] = '\0'
>>>>             [17] = '\0'
>>>>             [18] = '\0'
>>>>             [19] = '\0'
>>>>             [20] = '\0'
>>>>             [21] = '\0'
>>>>             [22] = '\0'
>>>>           }
>>>>         }
>>>>         __r = {
>>>>           __words = {
>>>>             [0] = 122511465736202
>>>>             [1] = 0
>>>>             [2] = 0
>>>>           }
>>>>         }
>>>>       }
>>>>     }
>>>>   }
>>>> }
>>>> }
>>>> 
>>>> So the main question is why are our "std::string" formatters not kicking 
>>>> in for you. That comes down to a typename match, or the format of the 
>>>> string isn't what the formatter is expecting.
>>>> 
>>>> But again, since you std::string can contain anything, you will need to 
>>>> escape any and all text that is encoded into JSON to ensure it doesn't 
>>>> contain anything JSON can't deal with.
>>>> 
>>>>> On Mar 27, 2016, at 9:20 PM, Jeffrey Tan via lldb-dev 
>>>>> <lldb-dev@lists.llvm.org> wrote:
>>>>> 
>>>>> Thanks Siva. All the DW_TAG_member related errors seems to go away after 
>>>>> patching with your fix. The current problem is handling the decoding. 
>>>>> 
>>>>> Here is the correct decoding from gdb whic might be useful:
>>>>> (gdb) p corpus
>>>>> $3 = (const std::string &) @0x7fd133cfb888: {
>>>>> static npos = 18446744073709551615, store_ = {
>>>>>   static kIsLittleEndian = <optimized out>,
>>>>>   static kIsBigEndian = <optimized out>, {
>>>>>     small_ = "www", '\000' <repeats 20 times>, "\024", ml_ = {
>>>>>       data_ = 0x777777 <std::_Any_data::_M_access<void 
>>>>> folly::fibers::Baton::waitFiber<folly::fibers::FirstArgOf<facebook::servicerouter::RequestDispatcherBase<facebook::servicerouter::ThriftDispatcher>::prepareForSelection(facebook::servicerouter::DispatchContext&)::{lambda(folly::fibers::Promise<facebook::servicerouter::RequestDispatcherBase<facebook::servicerouter::ThriftDispatcher>::prepareForSelection(facebook::servicerouter::DispatchContext&)::SelectionResult>)#1},
>>>>>  void>::type::value_type 
>>>>> folly::fibers::await<facebook::servicerouter::RequestDispatcherBase<facebook::servicerouter::ThriftDispatcher>::prepareForSelection(facebook::servicerouter::DispatchContext&)::{lambda(folly::fibers::Promise<facebook::servicerouter::RequestDispatcherBase<facebook::servicerouter::ThriftDispatcher>::prepareForSelection(facebook::servicerouter::DispatchContext&)::SelectionResult>)#1}>(folly::fibers::FirstArgOf&&)::{lambda()#1}>(folly::fibers::FiberManager&,
>>>>>  
>>>>> folly::fibers::FirstArgOf<folly::fibers::FirstArgOf<facebook::servicerouter::RequestDispatcherBase<facebook::servicerouter::ThriftDispatcher>::prepareForSelection(facebook::servicerouter::DispatchContext&)::{lambda(folly::fibers::Promise<facebook::servicerouter::RequestDispatcherBase<facebook::servicerouter::ThriftDispatcher>::prepareForSelection(facebook::servicerouter::DispatchContext&)::SelectionResult>)#1},
>>>>>  void>::type::value_type 
>>>>> folly::fibers::await<facebook::servicerouter::RequestDispatcherBase<facebook::servicerouter::ThriftDispatcher>::prepareForSelection(facebook::servicerouter::DispatchContext&)::{lambda(folly::fibers::Promise<facebook::servicerouter::RequestDispatcherBase<facebook::servicerouter::ThriftDispatcher>::prepareForSelection(facebook::servicerouter::DispatchContext&)::SelectionResult>)#1}>(folly::fibers::FirstArgOf&&)::{lambda()#1},
>>>>>  void>::type::value_type)::{lambda(folly::fibers::Fiber&)#1}*>() 
>>>>> const+25> 
>>>>> "\311\303UH\211\345H\211}\370H\213E\370]ÐUH\211\345H\203\354\020H\211}\370H\213E\370H\211\307\350~\264\312\377\220\311\303UH\211\345SH\203\354\030H\211}\350H\211u\340H\213E\340H\211\307\350\236\377\377\377H\213\030H\213E\350H\211\307\350O\264\312\377H\211ƿ\b",
>>>>>  size_ = 0,
>>>>>       capacity_ = 1441151880758558720}}}}
>>>>> 
>>>>> Utf-16 does not seem to decode it, while 'latin-1' does:
>>>>>>>> '\xc9'.decode('utf-16')
>>>>> Traceback (most recent call last):
>>>>> File "<stdin>", line 1, in <module>
>>>>> File 
>>>>> "/mnt/gvfs/third-party2/python/55c1fd79d91c77c95932db31a4769919611c12bb/2.7.8/centos6-native/da39a3e/lib/python2.7/encodings/utf_16.py",
>>>>>  line 16, in decode
>>>>>   return codecs.utf_16_decode(input, errors, True)
>>>>> UnicodeDecodeError: 'utf16' codec can't decode byte 0xc9 in position 0: 
>>>>> truncated data
>>>>>>>> '\xc9'.decode('latin-1')
>>>>> u'\xc9'
>>>>> 
>>>>> Instead of guessing what kind of decoding I should use, I would use 
>>>>> 'ensure_ascii=False' to prevent the crash for now.
>>>>> 
>>>>> I tried to reproduce this crash, but it seems that the crash might be 
>>>>> related with some internal stl implementation we are using. I will see if 
>>>>> I can narrow down to a small repro later. 
>>>>> 
>>>>> Thanks
>>>>> Jeffrey
>>>>> 
>>>>> On Sun, Mar 27, 2016 at 2:49 PM, Siva Chandra <sivachan...@gmail.com> 
>>>>> wrote:
>>>>> On Sat, Mar 26, 2016 at 11:58 PM, Jeffrey Tan <jeffrey.fu...@gmail.com> 
>>>>> wrote:
>>>>>> Btw: after patching with Siva's fix http://reviews.llvm.org/D18008, the
>>>>>> first field 'small_' is fixed, however the second field 'ml_' still emits
>>>>>> garbage:
>>>>>> 
>>>>>> (lldb) fr v corpus
>>>>>> (const string &const) corpus = error: summary string parsing error: {
>>>>>> store_ = {
>>>>>>    = {
>>>>>>     small_ = "www"
>>>>>>     ml_ = (data_ =
>>>>>> "��UH\x89�H�}�H\x8bE�]ÐUH\x89�H��H\x89}�H\x8bE�H\x89��~\xb4��\x90��UH\x89�SH\x83�H\x89}�H�u�H�E�H���\x9e���H\x8b\x18H\x8bE�H���O\xb4��H\x89ƿ\b",
>>>>>> size_ = 0, capacity_ = 1441151880758558720)
>>>>>>   }
>>>>>> }
>>>>>> }
>>>>> 
>>>>> Do you still see the DW_TAG_member related error?
>>>>> 
>>>>> A wild (and really wild at that) guess: Is it utf16 data that is being
>>>>> decoded as utf8?
>>>>> 
>>>>> As David Blaikie mentioned on the other thread, it would really help
>>>>> if you provide us with a minimal example to repro this. Atleast, repro
>>>>> instructions.
>>>>> 
>>>>> _______________________________________________
>>>>> lldb-dev mailing list
>>>>> lldb-dev@lists.llvm.org
>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
>>>> 
>>>> _______________________________________________
>>>> lldb-dev mailing list
>>>> lldb-dev@lists.llvm.org
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
>>> 
>>> 
>>> Thanks,
>>> - Enrico
>>> 📩 egranata@.com ☎️ 27683
>>> 
>>> 
>> 
>> 
>> Thanks,
>> - Enrico
>> 📩 egranata@.com ☎️ 27683
>> 
>> 
>> _______________________________________________
>> lldb-dev mailing list
>> lldb-dev@lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
>> _______________________________________________
>> lldb-dev mailing list
>> lldb-dev@lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
> 


Thanks,
- Enrico
📩 egranata@.com ☎️ 27683

_______________________________________________
lldb-dev mailing list
lldb-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev

Re: [lldb-dev] UnicodeDecodeError for serialize SBValue description

Reply via email to