I have missed info on versions. I have tested in on: find 4.7.0-git, libc6-2.23-0ubuntu9 (Ubuntu 16.04, x86_64) find 4.4.2, glibc-2.12-1.209.el6_9.2.x86_64 (CentOS 6, x86_64)
Same results. Regards, 2017-12-18 16:06 GMT+01:00 Piotr Gackiewicz <p.gackiew...@gmail.com>: > > Hello, > > I have spotted bizarre bug in gnu find. > In some circumstances, find result on '-regex' search is very dependendant > on locale settings. > > I have attached a zip file, with example file tree. There are two > directories in it, one's name encoded with 'utf-8' and other - in > iso-8859-2. > > Now we run find, trying to find files matching regex '.*\.exe' > > $ LANG=pl_PL.iso-8859-2 find htdocs -type f -regex '.*\.exe$' -ls > 12845718 12 -rw-rw-r-- 1 gacek gacek 2 Dec 18 15:00 > htdocs/Zielona\ G\363ra/hidden_malware.exe > 12845721 12 -rw-rw-r-- 1 gacek gacek 2 Dec 18 15:00 > htdocs/Zielona\ G\303\263ra/malware.exe > > Never mind the output encoding, it's expected. We have luckily found two > .exe files. > > But now, let's try to change locale to something more modern: > $ LANG=pl_PL.utf-8 find htdocs -type f -regex '.*\.exe$' -ls > 12845721 12 -rw-rw-r-- 1 gacek gacek 2 gru 18 15:00 > htdocs/Zielona\ G\303\263ra/malware.exe > > We have found only one of these files. One with iso-encoded filename is > hidden! > If one relies on -regex to search for suspicious files (apparently no > matter, which -regextype) , some of them could be missed and still lurking > in the filesystem. > Find is one of basic and best system tools to be used in such scenario. > > BTW, there is no such problem with '-name' matching: > $ LANG=pl_PL.utf-8 find htdocs -type f -name '*.exe' -ls > 12845718 12 -rw-rw-r-- 1 gacek gacek 2 gru 18 15:00 > htdocs/Zielona\ G\363ra/hidden_malware.exe > 12845721 12 -rw-rw-r-- 1 gacek gacek 2 gru 18 15:00 > htdocs/Zielona\ G\303\263ra/malware.exe > > Regards, > > -- > Piotr Gackiewicz > -- Piotr Gackiewicz