On Mon, Aug 5, 2019 at 5:50 PM Martin Sebor <mse...@gmail.com> wrote: > > On 8/5/19 1:25 PM, Jason Merrill wrote: > > On 8/1/19 7:35 PM, Martin Sebor wrote: > >> On 8/1/19 12:09 PM, Jason Merrill wrote: > >>> On 7/22/19 12:34 PM, Martin Sebor wrote: > >>>> Ping: https://gcc.gnu.org/ml/gcc-patches/2019-07/msg00622.html > >>>> > >>>> On 7/8/19 3:58 PM, Martin Sebor wrote: > >>>>> The attached patch implements three new warnings: > >>>>> > >>>>> * -Wstruct-not-pod triggers for struct definitions that are not > >>>>> POD structs, > >>>>> * -Wclass-is-pod triggers for class definitions that satisfy > >>>>> the requirements on POD structs, and > >>>>> * -Wmismatched-tags that triggers for class and struct declarations > >>>>> with class-key that doesn't match either their definition or > >>>>> the first declaration (if no definition is provided). > >>>>> > >>>>> The implementation of -Wclass-is-pod and -Wstruct-not-pod is fairly > >>>>> straightforward but the -Wmismatched-tags solution is slightly > >>>>> unusual. > >>>>> It collects struct and class declarations first and diagnoses > >>>>> mismatches > >>>>> only after the whole tramslation unit has been processed. This is so > >>>>> that the definition of a class can guide which declarations to > >>>>> diagnose > >>>>> no matter which come first. > >>> > >>> As Jonathan and I were saying in the connected discussion, the *pod > >>> warnings enforce questionable coding guidelines, so I'd rather not > >>> have them in the compiler. > >> > >> What specifically do you consider questionable? > > > >> When defining a new class one has to decide whether to use 'struct' > >> and when to use 'class'. Since there's no difference, adopting any > >> convention to help make a consistent choice seems like a natural > >> thing to do. You could say (as I think someone in this thread did): > >> "make it 'class' when it has a ctor or dtor." Someone else might > >> add: "or assignment." Someone else still might go even further > >> and include "private members or bases or virtual functions." Since > >> all the type definitions end up equivalent regardless of the class-key, > >> all these choices seem equally reasonable. But all of them are also > >> arbitrary. The only difference is that by being based on a common > >> (and at least historically, widely relied on) property, the POD > >> convention is less arbitrary. That's probably why it's apparently > >> quite popular (as evident from the Stack Exchange thread). > > > > Yes, all of them are somewhat arbitrary. This one seems like a poor > > choice, for the reasons Jon and I gave in the other thread. IMO > > 'aggregate' would be a better property to use. > > > > Which Stack Exchange thread do you mean? > > The one I mentioned here: > https://gcc.gnu.org/ml/gcc-patches/2019-07/msg01472.html > > i.e. the most popular answer to the question > > When should you use a class vs a struct in C++? > > is > > I would recommend using structs as plain-old-data structures > without any class-like features, and using classes as aggregate > data structures with private data and member functions. > > https://stackoverflow.com/questions/54585 > > My goal with the warning is to provide a way to help enforce > the convention we know is in use, both in GCC (at least until > it's removed) and in other projects. I'm not interested in > inventing something new even if it's "better" (in some sense) > than the convention we know is in use.
We don't know that this is in use as a strict division, anywhere. All the answers at that URL are using the term pretty loosely. The top answer says, "I would recommend using structs as plain-old-data structures without any class-like features, and using classes as aggregate data structures with private data and member functions." A struct containing a std::string is not using any class-like features, but is not a POD. I oppose a coding convention that requires the user to change that to a class with public: at the top. And so I oppose introducing warnings to enforce such a convention. > That said, as the C++ standard itself says, PODs (or in recent > revisions, trivial standard-layout types) are most commonly > associated with interoperability with C (and other languages). > Aggregates can have data members of types that don't exist in > C (references, pointers to members). So a convention that uses > the keyword 'struct' to reflect the common interoperability > aspect or that wants to convey that a C++ struct looks and feels > like one would expect of a C struct, will want to be based on > the concept of POD, not that of an aggregate. Note that the mention of pointers to members was removed from the POD definition in C++03. And then the term itself was deemed not a useful category. To me, the important C-like look and feel of a 'struct' is that it is intended for users to access the data directly, rather than through member functions. A struct with a string qualifies. It is possible for a class to be intended for use that way but still not qualify as an aggregate, e.g. because it has a constructor only to allow parenthesized initialization (as I've seen in GCC), but aggregate is a better approximation than POD for this distinction. > >>> -Wmismatched-tags is useful to have, given the MSVC/clang situation, > >>> but I wonder about memory consumption from all the record keeping. > >>> Do you have any overhead measurements? > >> > >> I did think about the overhead but not knowing if the patch would > >> even be considered I didn't spend time optimizing it. > >> > >> In an instrumented build of GCC with the patch I just did I have > >> collected the following stats for the number of elements in > >> the rec2loc hash table (for 998 files): > >> > >> rec2loc elements locvec elements > >> min: 0 0 > >> max: 2,785 3,303 > >> mean: 537 1,043 > >> median: 526 1,080 > >> > >> The locvec data are based on totals for the whole hash table, so > >> in the worst case the hash_map has 2,785 elements, and the sum of > >> all locvec lengths in all those elements is 3,303. Each rec2loc > >> element takes up 16 bytes, plus the size of the locvec data. > >> > >> If my math is right (which doesn't happen too often) in the worst > >> case the bookkeeping overhead is 43KB for the hash_map plus on > >> the order of 140KB for the vectors (just doubling the number of > >> elements to account for capacity = 2 X size, times 24 for > >> the flag_func_loc_t element type). So 183K in the worst case > >> in GCC. > > > > 183k cumulative over all the GCC sources doesn't sound excessive, but... > > > >> There are a few ways to reduce that if it seems excessive. > >> > >> One is by avoiding some waste in flag_func_loc_t which is > >> > >> pair<tag_types, pair<bool, pair<tree, location_t>>> > >> > >> tree could come first and tag_types and the bool could share > >> space. That should bring it down to 16 in LP64, for about > >> 30% off the 183K. > >> > >> Beyond that, we could change the algorithm to discard records > >> for prior declarations after the first definition has been seen > >> (and any mismatches diagnosed). > >> > >> We could also only collect one record for each definition > >> in system headers rather than one for every declaration and > >> reference. > > > > ...these all sound worthwhile. > > Okay, I'll look into it. > > To be clear: it was 183K maximum for a single compilation, not > aggregate for all of them. Aha. Thanks. Jason