Re: [PATCH 1/3] add -Wstruct-not-pod, -Wclass-is-pod, -Wmismatched-tags (PR 61339)

Jason Merrill Mon, 05 Aug 2019 15:31:29 -0700

On Mon, Aug 5, 2019 at 5:50 PM Martin Sebor <mse...@gmail.com> wrote:
>
> On 8/5/19 1:25 PM, Jason Merrill wrote:
> > On 8/1/19 7:35 PM, Martin Sebor wrote:
> >> On 8/1/19 12:09 PM, Jason Merrill wrote:
> >>> On 7/22/19 12:34 PM, Martin Sebor wrote:
> >>>> Ping: https://gcc.gnu.org/ml/gcc-patches/2019-07/msg00622.html
> >>>>
> >>>> On 7/8/19 3:58 PM, Martin Sebor wrote:
> >>>>> The attached patch implements three new warnings:
> >>>>>
> >>>>>   *  -Wstruct-not-pod triggers for struct definitions that are not
> >>>>>      POD structs,
> >>>>>   *  -Wclass-is-pod triggers for class definitions that satisfy
> >>>>>      the requirements on POD structs, and
> >>>>>   *  -Wmismatched-tags that triggers for class and struct declarations
> >>>>>      with class-key that doesn't match either their definition or
> >>>>>      the first declaration (if no definition is provided).
> >>>>>
> >>>>> The implementation of -Wclass-is-pod and -Wstruct-not-pod is fairly
> >>>>> straightforward but the -Wmismatched-tags solution is slightly
> >>>>> unusual.
> >>>>> It collects struct and class declarations first and diagnoses
> >>>>> mismatches
> >>>>> only after the whole tramslation unit has been processed.  This is so
> >>>>> that the definition of a class can guide which declarations to
> >>>>> diagnose
> >>>>> no matter which come first.
> >>>
> >>> As Jonathan and I were saying in the connected discussion, the *pod
> >>> warnings enforce questionable coding guidelines, so I'd rather not
> >>> have them in the compiler.
> >>
> >> What specifically do you consider questionable?
> >
> >> When defining a new class one has to decide whether to use 'struct'
> >> and when to use 'class'.  Since there's no difference, adopting any
> >> convention to help make a consistent choice seems like a natural
> >> thing to do.  You could say (as I think someone in this thread did):
> >> "make it 'class' when it has a ctor or dtor."  Someone else might
> >> add: "or assignment."  Someone else still might go even further
> >> and include "private members or bases or virtual functions."  Since
> >> all the type definitions end up equivalent regardless of the class-key,
> >> all these choices seem equally reasonable.  But all of them are also
> >> arbitrary.  The only difference is that by being based on a common
> >> (and at least historically, widely relied on) property, the POD
> >> convention is less arbitrary.  That's probably why it's apparently
> >> quite popular (as evident from the Stack Exchange thread).
> >
> > Yes, all of them are somewhat arbitrary.  This one seems like a poor
> > choice, for the reasons Jon and I gave in the other thread.  IMO
> > 'aggregate' would be a better property to use.
> >
> > Which Stack Exchange thread do you mean?
>
> The one I mentioned here:
>    https://gcc.gnu.org/ml/gcc-patches/2019-07/msg01472.html
>
> i.e. the most popular answer to the question
>
>    When should you use a class vs a struct in C++?
>
> is
>
>    I would recommend using structs as plain-old-data structures
>    without any class-like features, and using classes as aggregate
>    data structures with private data and member functions.
>
>    https://stackoverflow.com/questions/54585
>
> My goal with the warning is to provide a way to help enforce
> the convention we know is in use, both in GCC (at least until
> it's removed) and in other projects.  I'm not interested in
> inventing something new even if it's "better" (in some sense)
> than the convention we know is in use.


We don't know that this is in use as a strict division, anywhere.  All
the answers at that URL are using the term pretty loosely.  The top
answer says, "I would recommend using structs as plain-old-data
structures without any class-like features, and using classes as
aggregate data structures with private data and member functions."

A struct containing a std::string is not using any class-like
features, but is not a POD.  I oppose a coding convention that
requires the user to change that to a class with public: at the top.
And so I oppose introducing warnings to enforce such a convention.

> That said, as the C++ standard itself says, PODs (or in recent
> revisions, trivial standard-layout types) are most commonly
> associated with interoperability with C (and other languages).
> Aggregates can have data members of types that don't exist in
> C (references, pointers to members).  So a convention that uses
> the keyword 'struct' to reflect the common interoperability
> aspect or that wants to convey that a C++ struct looks and feels
> like one would expect of a C struct, will want to be based on
> the concept of POD, not that of an aggregate.

Note that the mention of pointers to members was removed from the POD
definition in C++03.  And then the term itself was deemed not a useful
category.

To me, the important C-like look and feel of a 'struct' is that it is
intended for users to access the data directly, rather than through
member functions.  A struct with a string qualifies.  It is possible
for a class to be intended for use that way but still not qualify as
an aggregate, e.g. because it has a constructor only to allow
parenthesized initialization (as I've seen in GCC), but aggregate is a
better approximation than POD for this distinction.

> >>> -Wmismatched-tags is useful to have, given the MSVC/clang situation,
> >>> but I wonder about memory consumption from all the record keeping.
> >>> Do you have any overhead measurements?
> >>
> >> I did think about the overhead but not knowing if the patch would
> >> even be considered I didn't spend time optimizing it.
> >>
> >> In an instrumented build of GCC with the patch I just did I have
> >> collected the following stats for the number of elements in
> >> the rec2loc hash table (for 998 files):
> >>
> >>    rec2loc elements   locvec elements
> >>    min:           0                 0
> >>    max:       2,785             3,303
> >>    mean:        537             1,043
> >>    median:      526             1,080
> >>
> >> The locvec data are based on totals for the whole hash table, so
> >> in the worst case the hash_map has 2,785 elements, and the sum of
> >> all locvec lengths in all those elements is 3,303.  Each rec2loc
> >> element takes up 16 bytes, plus the size of the locvec data.
> >>
> >> If my math is right (which doesn't happen too often) in the worst
> >> case the bookkeeping overhead is 43KB for the hash_map plus on
> >> the order of 140KB for the vectors (just doubling the number of
> >> elements to account for capacity = 2 X size, times 24 for
> >> the flag_func_loc_t element type).  So 183K in the worst case
> >> in GCC.
> >
> > 183k cumulative over all the GCC sources doesn't sound excessive, but...
> >
> >> There are a few ways to reduce that if it seems excessive.
> >>
> >> One is by avoiding some waste in flag_func_loc_t which is
> >>
> >>    pair<tag_types, pair<bool, pair<tree, location_t>>>
> >>
> >> tree could come first and tag_types and the bool could share
> >> space.  That should bring it down to 16 in LP64, for about
> >> 30% off the 183K.
> >>
> >> Beyond that, we could change the algorithm to discard records
> >> for prior declarations after the first definition has been seen
> >> (and any mismatches diagnosed).
> >>
> >> We could also only collect one record for each definition
> >> in system headers rather than one for every declaration and
> >> reference.
> >
> > ...these all sound worthwhile.
>
> Okay, I'll look into it.
>
> To be clear: it was 183K maximum for a single compilation, not
> aggregate for all of them.

Aha.  Thanks.

Jason

Re: [PATCH 1/3] add -Wstruct-not-pod, -Wclass-is-pod, -Wmismatched-tags (PR 61339)

Reply via email to