rsmith added inline comments.

================
Comment at: clang/lib/Sema/SemaChecking.cpp:433
+
+  llvm::StringRef getFormatSpecifier(QualType T) {
+    if (auto *BT = T->getAs<BuiltinType>()) {
----------------
yihanaa wrote:
> rsmith wrote:
> > rsmith wrote:
> > > aaron.ballman wrote:
> > > > yihanaa wrote:
> > > > > I think this is better maintained in "clang/AST/FormatString.h". For 
> > > > > example analyze_printf::PrintfSpecifier can get format specifier, 
> > > > > what do you all think about?
> > > > +1 to this suggestion -- my hope is that we could generalize it more 
> > > > then as I notice there are missing specifiers for things like intmax_t, 
> > > > size_t, ptrdiff_t, _Decimal types, etc. Plus, that will hopefully make 
> > > > it easier for __builtin_dump_struct to benefit when new format 
> > > > specifiers are added, such as ones for printing a _BitInt.
> > > I am somewhat uncertain: every one of these is making arbitrary choices 
> > > about how to format the value, so it's not clear to me that this is 
> > > general logic rather than something specific to `__builtin_dump_struct`. 
> > > For example, using `%f` rather than one of the myriad other ways of 
> > > formatting `double` is somewhat arbitrary. Using `%s` for any `const 
> > > char*` is *extremely* arbitrary and will be wrong and will cause crashes 
> > > in some cases, but it may be the pragmatically correct choice for a 
> > > dumping utility. A general-purpose mechanism would use `%p` for all kinds 
> > > of pointer.
> > > 
> > > We could perhaps factor out the formatting for cases where there is a 
> > > clear canonical default formatting, such as for integer types and 
> > > probably `%p` for pointers, then call that from here with some 
> > > special-casing, but without a second consumer for that functionality it's 
> > > really not clear to me what form it should take.
> > I went ahead and did this, mostly to match concurrent changes to the old 
> > implementation. There are a few cases where our existing "guess a format 
> > specifier" logic does the wrong thing for dumping purposes, which I've 
> > explicitly handled -- things like wanting to dump a `char` / `signed char` 
> > / `unsigned char` member as a number rather than as a (potentially 
> > non-printable or whitespace) character.
>  When I was patching that old implementation, I found that for uint8_t, 
> int8_t, Clang's existing "guess a format specifier" logic would treat the 
> value as an integer, but for unsigned char, signed char, char types, it would 
> Treat it as a character, please look at this example ( 
> https://godbolt.org/z/ooqn4468T ), I guess this existing logic may have made 
> some special judgment.
Yeah. I think in the case where we see some random call to `printf`, `%c` is 
probably the right thing to guess here, but it doesn't seem appropriate to me 
to use this in a dumping routine. If we could dump as `'x'` for printable 
characters and as `'\xAB'` for everything else, that'd be great, but `printf` 
can't do that itself and I'm not sure we want to be injecting calls to 
`isprint` or whatever to make that work. Dumping as an integer always seems 
like it's probably the least-bad option.

Somewhat related: I wonder if we should use `"\"%s\""` instead of simply `"%s"` 
when dumping a `const char*`. That's not ideal but probably clearer than the 
current dump output.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D124221/new/

https://reviews.llvm.org/D124221

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to