Hi Kam, On 3/24/21 6:17 PM, Yuen, Kam-Kuen CIV USARMY DEVCOM SC (USA) via Bug reports for the GNU find utilities wrote: > I am running the following command and the "ls" command gives error message > that the file cannot be found. The problem is that the filename has spaces > as part of the filename. > The purpose is to find all files that exceeding file size of 1k. Filename > might include spaces, special character like ' > > find . -size +1k -print | xargs ls -sd
There is no bug in any of the tools involved in this command line, find(1), xargs(1) and ls(1). It is merely a wrong assumption about how they work (together). Assumimg the above search will match the 2 files: $ touch 'This is a Test' $ touch ' This is another Test' $ ls -log total 0 -rw-r--r-- 1 0 Mar 24 21:36 ' This is another Test' -rw-r--r-- 1 0 Mar 24 21:35 'This is a Test' find(1) will print the file names matching the criteria, separated by a newline character. E.g.: This is a Test <newline> This is another Test <newline> Shown as hex output: $ find . -type f | od -tx1z 0000000 2e 2f 20 54 68 69 73 20 69 73 20 20 20 61 6e 6f >./ This is ano< 0000020 74 68 65 72 20 54 65 73 74 0a 2e 2f 54 68 69 73 >ther Test../This< 0000040 20 69 73 20 61 20 54 65 73 74 0a > is a Test.< 0000053 xargs(1) reads the entries from standard input, and assumes that the entries are per default separated by a <blank> character or a <newline>. See POSIX: [...] arguments in the standard input are separated by unquoted <blank> characters, unescaped <blank> characters, or <newline> characters. Also 'man xargs' documents this quite at the top: [...] delimited by blanks [...] or newlines Wit the above input from find(1), this means that xargs(1) recognizes the following entries: - 'This' - 'is' - 'a' - 'Test' - 'This' - 'is' - 'another' - 'Test' Note that blanks in the file names printed by find(1) will lead to separate entries, with extra blanks already ignored. As all of the above 8 entries can easily be packed into one invocation of the command to run, ls(1), it is started with those 8 separate arguments. strace(1) shows what will be executed: $ find . -type f | strace -ve execve xargs ls -logd [...] execve("/usr/bin/ls", ["ls", "-logd", "./", "This", "is", "another", "Test", "./This", "is", "a", "Test"], ...) = 0 Obviously, ls(1) will probably not be able to stat(2) any of the files (or in the worst case accidentally ones which have one of the shorter names). > 1) The env is cygwin64 on Windows 10 > > 2) Filename include space or special character > > 3) When running "ls" command directly on the folder, the screen show " > ' " character surrounding the filename e.g. 'This is a Test Case With spaces > in Filename.pdf' As the output is a terminal, ls(1) defaults to quoting each file name properly so that it coule be copy+pasted safely to another command. Although there are discussions about this feature on the GNU coreutils mailing list, I personally consider this is a good thing. > 4) In the case the filename already has ' special character, the "ls" > command shows the filename with double " around the filename e.g. " This is a > Tester's File.pdf" The same here: ls(1) quotes the file name so that it can be copy+pasted safely. And note that this also includes the leading blank in the file name: " This ....". > 5) When saving simple "ls" output to a file, do not see the surrounding > character Indeed, when printing to a file, ls(1) must only print the original characters of the file names without quoting. > 6) Trying to use the -0 option with xargs but it complains the argument > line too long When using 'xargs -0', then the producer of the input also has to adhere to the chosen convention to separate the entries by a NUL character instead of newlines. 'man xargs' says: -0, --null ... The GNU find -print0 option produces input suitable for this mode. > Can you advise How to handle filename with hidden character like ' or space > or to report file size of current and subdirectories There are several safer alternatives, all of them documented in the GNU findutils manual. https://www.gnu.org/software/findutils/manual/find.html E.g. # Tell find(1) to also use the NUL character as a separator: use -print0. # This is safe for really all possible file names, including those with single or double quotes, # tabs and blanks, and finally also newlines. Yes, the only character which cannot occur # is the NUL character. $ find . -size +1k -print0 | xargs -0 ls -sd Note that xargs(1) will invoke ls(1) also if find(1) didn't match any file in the above example. Better to use the -r, --no-run-if-empty option: $ find . -size +1k -print0 | xargs -r0 ls -sd FWIW: One drawback is that there is a small race condition between the time find(1) is examining the file and the time ls(1) will see it: one has to be aware that file system is constantly changing. Another alternative is to let find(1) directly print the file size and file name. This avoid the race condition. $ find . -size +1k -printf '%s %f\n' Obviously, the output is not safe to process by another tool when a file name contains a file name, but for the human eyes its probably good enough. Furthermore, there are also alternatives with other tools, e.g. the du(1) command from the GNU coreutils has a -t, --threshold option to filter files by their sizes (but also outputs directories): $ du -at +1k Hope this helps. Have a nice day, Berny