Thanks Enrico. This is very detailed! I will take a look. Btw: originally, I was hoping that data formatter can be added without changing the source code. Like giving a xml/json format file telling lldb the memory layout/structure of the data structure, lldb can parse the xml/json and deduce the formatting. This is approach used by data visualizer in VS debugger: https://msdn.microsoft.com/en-us/library/jj620914.aspx This will make adding data formatter more extensible/flexible. Any reason we did not take this approach?
Jeffrey On Wed, Apr 6, 2016 at 11:49 AM, Enrico Granata <egran...@apple.com> wrote: > > On Apr 5, 2016, at 2:42 PM, Jeffrey Tan <jeffrey.fu...@gmail.com> wrote: > > Hi Enrico, > > Any suggestion/example how to add a data formatter for our own STL string? > From the output below I can see we are using our own "*fbstring_core*" > which I assume I need to write a type summary for this type: > > frame variable corpus -T > (const string &const) corpus = error: summary string parsing error: { > (std::*fbstring_core*<char>) store_ = { > (std::*fbstring_core*<char>::(anonymous union)) = { > (char [24]) small_ = "www" > (std::fbstring_core<char>::MediumLarge) ml_ = { > (char *) data_ = 0x0000000000777777 > "H\x89U\xa8H\x89M\xa0L\x89E\x98H\x8bE\xa8H\x89��_U��D\x88e�H\x8bE\xa0H\x89��]U��H\x89�H\x8dE�H\x89�H\x89��� > ��L\x8dm�H\x8bE\x98H\x89��IU��\x88]�L\x8be\xb0L\x89�� > (std::size_t) size_ = 0 > (std::size_t) capacity_ = 1441151880758558720 > } > } > } > } > > > Admittedly, this is going to be a little vague since I haven’t really seen > your code and I am only working off of one sample > > There’s going to be two parts to getting this to work: > > *Part 1 - Formatting fbstring_core* > > At a glance, an fbstring_core<char> can be backed by two representations. > A “small” representation (a char array), and a “medium/large" > representation (a char* + a size) > I assume that the way you tell one from the other is > > if (size == 0) small > else medium-large > > If my assumption is not correct, you’ll need to discover what the correct > discriminator logic is - the class has to know, and so do you :-) > > Armed with that knowledge, look in lldb > source/Plugins/Language/CPlusPlus/Formatters/LibCxx.cpp > There’s a bunch of code that deals with formatting llvm’s libc++ > std::string - which follows a very similar logic to your class > > ExtractLibcxxStringInfo() is the function that handles discovering which > layout the string uses - where the data lives - and how much data there is > > Once you have told yourself how much data there is (the size) and where it > lives (array or pointer), LibcxxStringSummaryProvider() has the easy task > - it sets up a StringPrinter, tells it how much data to print, where to get > it from, and then delegates the StringPrinter to do the grunt work > StringPrinter is a nifty little tool - it can handle generating summaries > for different kinds of strings (UTF8? UTF16? we got it - is a \0 a > terminator? what quote character would you like? …) - you point it at some > data, set up a few options, and it will generate a printable representation > for you - if your string type is doing anything out of the ordinary, let’s > talk - I am definitely open to extending StringPrinter to handle even more > magic > > *Part 2 - Teaching std::string that it can be backed by an fbstring_core* > > At the end of part 1, you’ll probably end up with a > FBStringCoreSummaryProvider() - now you need to teach LLDB about it > The obvious thing you could do would be to go in CPlusPlusLanguage > ::GetFormatters() add a LoadFBStringFormatter(g_category) to it - and > then imitate - say - LoadLibCxxFormatters() > > AddCXXSummary(cpp_category_sp, lldb_private::formatters:: > FBStringCoreSummaryProvider, “fbstringcore summary provider", ConstString( > “std::fbstring_core<.+>"), stl_summary_flags, true); > > That will work - but what you would see is: > > (const string &const) corpus = error: summary string parsing error: { > (std::*fbstring_core*<char>) store_ = “www" > > > You wanna do > > (lldb) log enable lldb formatters > (lldb) frame variable -T corpus > > It will list one or more typenames - the most specific one is the one you > like (e.g. for libc++ we get std::__1::string - this is how we tell > ourselves this is the std::string from libc++) > Once you find that typename, you’ll make a new formatter - > FBStringSummaryProvider() - and register that formatter with that very > specific typename > > All that FBStringSummaryProvider() has to do is get the “store_” member > (ValueObject::GetChildMemberWithName() is your friend) - and pass it down > to FBStringCoreSummaryProvider() > > > I understand this may seem a little convoluted and arcane at first - but > feel free to ask more questions, and I’ll try to help out! > > Thanks. > Jeffrey > > On Mon, Mar 28, 2016 at 11:38 AM, Enrico Granata <egran...@apple.com> > wrote: > >> This is kind of orthogonal to your problem, but the reason why you are >> not seeing the kind of simplified printing Greg is suggesting, is because >> your std::string doesn’t look like any of the kinds we recognize >> >> Specifically, LLDB data formatters work by matching against type names, >> and once they recognize a typename, then they try to inspect the variable >> in order to grab a summary >> In your example, your std::string exposes a layout that we are not >> handling - hence we bail out of the formatter and we fall back to the raw >> view >> >> If you want pretty printing to work, you’ll need to write a data formatter >> >> There are a few avenues. The obvious easy one is to extend the existing >> std::string formatter to recognize your type’s internal layout. >> If one were signing up for more infrastructure work, they could decide to >> try and detect shared library loads and load formatters that match with >> whatever libraries are being loaded. >> >> On Mar 28, 2016, at 9:47 AM, Greg Clayton via lldb-dev < >> lldb-dev@lists.llvm.org> wrote: >> >> So you need to be prepared to escape any text that can have special >> characters. A "std::string" or any container can contain special >> characters. If you are encoding stuff into JSON, you will either need to >> escape any special characters, or hex encode the string into ASCII hex >> bytes. >> >> In debuggers we often get bogus data because variables are not >> initialized, but the compiler tells us that a variable is valid in address >> range [0x1000-0x2000), but it actually is [0x1200-0x2000). If we read a >> variable in this case, a std::string might contain bogus data and the bytes >> might not make sense. So you always have to be prepared for bad data. >> >> If we look at: >> >> store_ = { >> = { >> small_ = "www" >> ml_ = (data_ = >> >> "��UH\x89�H�}�H\x8bE�]ÐUH\x89�H��H\x89}�H\x8bE�H\x89��~\xb4��\x90��UH\x89�SH\x83�H\x89}�H�u�H�E�H���\x9e���H\x8b\x18H\x8bE�H���O\xb4��H\x89ƿ\b", >> size_ = 0, capacity_ = 1441151880758558720) >> } >> } >> } >> >> We can see the "size_" is zero, and capacity_ is 1441151880758558720 >> (which is 0x1400000000000000). "data_" seems to be some random pointer. >> >> On MacOSX, we have a special formatting code that displays std::string in >> CPlusPlusLanguage.cpp that gets installed in the LoadLibCxxFormatters() or >> LoadLibStdcppFormatters() functions with code like: >> >> lldb::TypeSummaryImplSP std_string_summary_sp(new >> CXXFunctionSummaryFormat(stl_summary_flags, >> lldb_private::formatters::LibcxxStringSummaryProvider, "std::string summary >> provider")); >> >> cpp_category_sp->GetTypeSummariesContainer()->Add(ConstString("std::__1::string"), >> std_string_summary_sp); >> >> Special flags are set on std::string to say "don't show children of this >> and just show a summary" So if a std::string contained "hello". So for the >> following code: >> >> std::string h ("hello"); >> >> You should just see: >> >> (lldb) fr var h >> (std::__1::string) h = "hello" >> >> If you take a look at the normal value in the raw we see: >> >> (lldb) fr var --raw h >> (std::__1::string) h = { >> __r_ = { >> std::__1::__libcpp_compressed_pair_imp<std::__1::basic_string<char, >> std::__1::char_traits<char>, std::__1::allocator<char> >::__rep, >> std::__1::allocator<char>, 2> = { >> __first_ = { >> = { >> __l = { >> __cap_ = 122511465736202 >> __size_ = 0 >> __data_ = 0x0000000000000000 >> } >> __s = { >> = { >> __size_ = '\n' >> __lx = '\n' >> } >> __data_ = { >> [0] = 'h' >> [1] = 'e' >> [2] = 'l' >> [3] = 'l' >> [4] = 'o' >> [5] = '\0' >> [6] = '\0' >> [7] = '\0' >> [8] = '\0' >> [9] = '\0' >> [10] = '\0' >> [11] = '\0' >> [12] = '\0' >> [13] = '\0' >> [14] = '\0' >> [15] = '\0' >> [16] = '\0' >> [17] = '\0' >> [18] = '\0' >> [19] = '\0' >> [20] = '\0' >> [21] = '\0' >> [22] = '\0' >> } >> } >> __r = { >> __words = { >> [0] = 122511465736202 >> [1] = 0 >> [2] = 0 >> } >> } >> } >> } >> } >> } >> } >> >> So the main question is why are our "std::string" formatters not kicking >> in for you. That comes down to a typename match, or the format of the >> string isn't what the formatter is expecting. >> >> But again, since you std::string can contain anything, you will need to >> escape any and all text that is encoded into JSON to ensure it doesn't >> contain anything JSON can't deal with. >> >> On Mar 27, 2016, at 9:20 PM, Jeffrey Tan via lldb-dev < >> lldb-dev@lists.llvm.org> wrote: >> >> Thanks Siva. All the DW_TAG_member related errors seems to go away after >> patching with your fix. The current problem is handling the decoding. >> >> Here is the correct decoding from gdb whic might be useful: >> (gdb) p corpus >> $3 = (const std::string &) @0x7fd133cfb888: { >> static npos = 18446744073709551615, store_ = { >> static kIsLittleEndian = <optimized out>, >> static kIsBigEndian = <optimized out>, { >> small_ = "www", '\000' <repeats 20 times>, "\024", ml_ = { >> data_ = 0x777777 <std::_Any_data::_M_access<void >> folly::fibers::Baton::waitFiber<folly::fibers::FirstArgOf<facebook::servicerouter::RequestDispatcherBase<facebook::servicerouter::ThriftDispatcher>::prepareForSelection(facebook::servicerouter::DispatchContext&)::{lambda(folly::fibers::Promise<facebook::servicerouter::RequestDispatcherBase<facebook::servicerouter::ThriftDispatcher>::prepareForSelection(facebook::servicerouter::DispatchContext&)::SelectionResult>)#1}, >> void>::type::value_type >> folly::fibers::await<facebook::servicerouter::RequestDispatcherBase<facebook::servicerouter::ThriftDispatcher>::prepareForSelection(facebook::servicerouter::DispatchContext&)::{lambda(folly::fibers::Promise<facebook::servicerouter::RequestDispatcherBase<facebook::servicerouter::ThriftDispatcher>::prepareForSelection(facebook::servicerouter::DispatchContext&)::SelectionResult>)#1}>(folly::fibers::FirstArgOf&&)::{lambda()#1}>(folly::fibers::FiberManager&, >> folly::fibers::FirstArgOf<folly::fibers::FirstArgOf<facebook::servicerouter::RequestDispatcherBase<facebook::servicerouter::ThriftDispatcher>::prepareForSelection(facebook::servicerouter::DispatchContext&)::{lambda(folly::fibers::Promise<facebook::servicerouter::RequestDispatcherBase<facebook::servicerouter::ThriftDispatcher>::prepareForSelection(facebook::servicerouter::DispatchContext&)::SelectionResult>)#1}, >> void>::type::value_type >> folly::fibers::await<facebook::servicerouter::RequestDispatcherBase<facebook::servicerouter::ThriftDispatcher>::prepareForSelection(facebook::servicerouter::DispatchContext&)::{lambda(folly::fibers::Promise<facebook::servicerouter::RequestDispatcherBase<facebook::servicerouter::ThriftDispatcher>::prepareForSelection(facebook::servicerouter::DispatchContext&)::SelectionResult>)#1}>(folly::fibers::FirstArgOf&&)::{lambda()#1}, >> void>::type::value_type)::{lambda(folly::fibers::Fiber&)#1}*>() const+25> >> "\311\303UH\211\345H\211}\370H\213E\370]ÐUH\211\345H\203\354\020H\211}\370H\213E\370H\211\307\350~\264\312\377\220\311\303UH\211\345SH\203\354\030H\211}\350H\211u\340H\213E\340H\211\307\350\236\377\377\377H\213\030H\213E\350H\211\307\350O\264\312\377H\211ƿ\b", >> size_ = 0, >> capacity_ = 1441151880758558720}}}} >> >> Utf-16 does not seem to decode it, while 'latin-1' does: >> >> '\xc9'.decode('utf-16') >> >> Traceback (most recent call last): >> File "<stdin>", line 1, in <module> >> File >> "/mnt/gvfs/third-party2/python/55c1fd79d91c77c95932db31a4769919611c12bb/2.7.8/centos6-native/da39a3e/lib/python2.7/encodings/utf_16.py", >> line 16, in decode >> return codecs.utf_16_decode(input, errors, True) >> UnicodeDecodeError: 'utf16' codec can't decode byte 0xc9 in position 0: >> truncated data >> >> '\xc9'.decode('latin-1') >> >> u'\xc9' >> >> Instead of guessing what kind of decoding I should use, I would use >> 'ensure_ascii=False' to prevent the crash for now. >> >> I tried to reproduce this crash, but it seems that the crash might be >> related with some internal stl implementation we are using. I will see if I >> can narrow down to a small repro later. >> >> Thanks >> Jeffrey >> >> On Sun, Mar 27, 2016 at 2:49 PM, Siva Chandra <sivachan...@gmail.com> >> wrote: >> On Sat, Mar 26, 2016 at 11:58 PM, Jeffrey Tan <jeffrey.fu...@gmail.com> >> wrote: >> >> Btw: after patching with Siva's fix http://reviews.llvm.org/D18008, the >> first field 'small_' is fixed, however the second field 'ml_' still emits >> garbage: >> >> (lldb) fr v corpus >> (const string &const) corpus = error: summary string parsing error: { >> store_ = { >> = { >> small_ = "www" >> ml_ = (data_ = >> >> "��UH\x89�H�}�H\x8bE�]ÐUH\x89�H��H\x89}�H\x8bE�H\x89��~\xb4��\x90��UH\x89�SH\x83�H\x89}�H�u�H�E�H���\x9e���H\x8b\x18H\x8bE�H���O\xb4��H\x89ƿ\b", >> size_ = 0, capacity_ = 1441151880758558720) >> } >> } >> } >> >> >> Do you still see the DW_TAG_member related error? >> >> A wild (and really wild at that) guess: Is it utf16 data that is being >> decoded as utf8? >> >> As David Blaikie mentioned on the other thread, it would really help >> if you provide us with a minimal example to repro this. Atleast, repro >> instructions. >> >> _______________________________________________ >> lldb-dev mailing list >> lldb-dev@lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev >> >> >> _______________________________________________ >> lldb-dev mailing list >> lldb-dev@lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev >> >> >> >> Thanks, >> *- Enrico* >> 📩 egranata@.com ☎️ 27683 >> >> > > > Thanks, > *- Enrico* > 📩 egranata@.com ☎️ 27683 > >
_______________________________________________ lldb-dev mailing list lldb-dev@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev