I think, PCRE does not have this problem.
What about implementing '-regextype pcre' ?
Dou You think it's a good idea?

Regards,

2017-12-19 23:01 GMT+01:00 Piotr Gackiewicz <p.gackiew...@gmail.com>:

> So, if it's by-design, users should be explicitly warned in man page:
>
> Do not rely on -regex, you could miss some badly encoded filenames.
>
> Or perhaps find could be enhanced with another regextype, non-posix and
> matching those ;-).
>
> Regards,
>
>
> 2017-12-19 22:48 GMT+01:00 Eric Blake <ebl...@redhat.com>:
>
>> On 12/19/2017 03:31 PM, Bernhard Voelker wrote:
>>
>>
>>> The test case in your attachment is a bit different, but also shows
>>> the problem.  It seems that gnulib's regex does not find a match for
>>> the pattern '.*\.exe$' for the files in the following directory:
>>>
>>>    $ LC_ALL=C /usr/bin/ls -log htdocs
>>>    ...
>>>    drwxr-xr-x 2 4096 Dec 18 20:45 'Zielona G'$'\363''ra'
>>>    ...
>>>
>>> I'm not an expert on UTF and regex, but it seems that the $'\363'
>>> character is not matched by the dot '.' meta character in your
>>> locale.
>>>
>>
>> POSIX says that regex only has to match characters (in particular, the
>> glob '.' matches characters, not encoding errors).  If you pick a locale
>> with multibyte characters that are subject to encoding errors when
>> processing random bytes (as is the case when using a UTF-8 locale to
>> process single-byte ISO filenames), then POSIX says regex behavior is
>> undefined.  So while it is indeed annoying that find can't match files with
>> encoding errors, it is somewhat expected behavior, because there's no sane
>> way to make regex well-specified on encoding errors.
>>
>> --
>> Eric Blake, Principal Software Engineer
>> Red Hat, Inc.           +1-919-301-3266
>> Virtualization:  qemu.org | libvirt.org
>>
>
>
>
> --
> Piotr Gackiewicz
>

-- 
Piotr Gackiewicz

Reply via email to