On Tue, Sep 27, 2016 at 06:51:31AM -0500, Segher Boessenkool wrote: > On Tue, Sep 27, 2016 at 01:31:15PM +0200, Jakub Jelinek wrote: > > I think it is important to think in terms of what regexps we still want to > > match, even when the matching is actually implemented in C, not using > > regexps. And yes, you list one reason why arbitrary text with fall and > > through somewhere in it is not a good idea. Another: > > /* XXX Really fallthru? */ > > (what we have in pch.c). > > So, if you want to allow ... fall through ... and else fall through, and > > perhaps // fall through - some explanation > > then it might be e.g. > > //-fallthrough$ > > //@fallthrough@$ > > /\*-fallthrough\*/ > > /\*@fallthrough@\*/ > > //[ \t.]*(ELSE )?FALL(S | |-)?THR(OUGH|U)[ \t.]*(-.*)?$ > > //[ \t.]*(Else )?Fall(s | |-)?[Tt]hr(ough|u)[ \t.]*(-.*)?$ > > //[ \t.]*(else )?fall(s | |-)?thr(ough|u)[ \t.]*(-.*)?$ > > /\*[ \t.]*(ELSE )?FALL(S | |-)?THR(OUGH|U)[ \t.]*(-.*)?\*/ > > /\*[ \t.]*(Else )?Fall(s | |-)?[Tt]hr(ough|u)[ \t.]*(-.*)?\*/ > > /\*[ \t.]*(else )?fall(s | |-)?thr(ough|u)[ \t.]*(-.*)?\*/ > > where . would match even newlines in the last 3, > > but $ would always match just end of line? > > Any comment with text > > ^[^_[:alnum:]]*(else )?fall(s | |-)?thr(ough|u)[^_[:alnum:]]*$ > > perhaps? Case-insensitive. Or allow any amount of space, or even any > interpunction. Just don't allow any alphanumerics except for those > exact words, and there won't be many false hits at all.
Not sure we want to match FaLlS THrouGH, and [^_[:alnum:]]* isn't without a problem either, what if there is hebrew, or chinese, etc. text in there? The matching shouldn't depend on the current locale IMHO, and figuring out what unicode entry points are letters and which are not really isn't easy without that. IMO before changing anything further, we want to gather some statistics what styles are actually used in the wild together with how often they are used, and then for the more common ones decide what is really supportable. Jakub