On Tue, Sep 27, 2016 at 01:58:54PM +0200, Jakub Jelinek wrote:
> > Any comment with text
> > 
> > ^[^_[:alnum:]]*(else )?fall(s | |-)?thr(ough|u)[^_[:alnum:]]*$
> > 
> > perhaps?  Case-insensitive.  Or allow any amount of space, or even any
> > interpunction.  Just don't allow any alphanumerics except for those
> > exact words, and there won't be many false hits at all.
> 
> Not sure we want to match FaLlS THrouGH,

Yes it's silly, but would it ever match the wrong thing?

> and [^_[:alnum:]]* isn't without a
> problem either, what if there is hebrew, or chinese, etc. text in there?

I meant in LANG=C, but it would work otherwise, too.  Nasty, of course.

> The matching shouldn't depend on the current locale IMHO, and figuring out 
> what
> unicode entry points are letters and which are not really isn't easy without 
> that.

Right.

> IMO before changing anything further, we want to gather some statistics what
> styles are actually used in the wild together with how often they are used,
> and then for the more common ones decide what is really supportable.

If you do not allow a lot then there will be many false negatives.


Segher

Reply via email to