On Tue, Sep 27, 2016 at 01:58:54PM +0200, Jakub Jelinek wrote: > > Any comment with text > > > > ^[^_[:alnum:]]*(else )?fall(s | |-)?thr(ough|u)[^_[:alnum:]]*$ > > > > perhaps? Case-insensitive. Or allow any amount of space, or even any > > interpunction. Just don't allow any alphanumerics except for those > > exact words, and there won't be many false hits at all. > > Not sure we want to match FaLlS THrouGH,
Yes it's silly, but would it ever match the wrong thing? > and [^_[:alnum:]]* isn't without a > problem either, what if there is hebrew, or chinese, etc. text in there? I meant in LANG=C, but it would work otherwise, too. Nasty, of course. > The matching shouldn't depend on the current locale IMHO, and figuring out > what > unicode entry points are letters and which are not really isn't easy without > that. Right. > IMO before changing anything further, we want to gather some statistics what > styles are actually used in the wild together with how often they are used, > and then for the more common ones decide what is really supportable. If you do not allow a lot then there will be many false negatives. Segher