On Sat, Sep 18, 2010 at 09:16:46PM -0500, Peng Yu wrote: > Hi, > > stat --printf "%y %n\n" `find . -type f -print`
Chris and Pierre already helped with this specific example. I'd like to address the more general case. In the original design of the Unix shell, in many ways and places, it's quite apparent that the designers never really intended to handle filenames that contain whitespace. Things like your stat `find . -print` example look like they ought to work, but they don't -- precisely because the shell's word-splitting operates on whitespace, but filenames are ALLOWED to contain whitespace. There is an obstacle here, and there is NO WAY to overcome it. It's completely impossible. The only solution is to use an entirely different approach altogether. Thus, the proposed alternatives such as find . -exec stat {} + which Chris and Pierre have already provided. To clarify the problem, when you write `...` or $(...) you produce a single string which is the entire output all shoved together ("serialized" is the fancy word for it). The shell takes this single string and then tries to break it apart into meaningful chunks (word splitting). However, with serialized filenames, there is no way to tell where one filename ends and the next begins. If you see the string "foo bar", you can't tell whether that's one file name with a space in the middle, or two filenames with a space between them. Likewise, newlines are allowed in filenames. If you see the string "foo\nbar" where \n is a newline, you can't tell whether it's one filename or two filenames. The only character that is NOT allowed in a Unix filename is NUL (ASCII 0). So, if you have a serialized stream of filenames "foo\0bar\0" then you know that there are two filenames, and the NUL (\0) bytes tell you where they end. That's wonderful if you're reading from a stream or a file. But it doesn't help you with command substitution (`...` or $(...)) because you can't work with NUL bytes in the shell. The command substitution goes into a C string in memory. When you try to read back the contents of that C string, you stop at the first NUL, because that's what NUL means in a C string -- "end of string". Bash and ksh actually handle this differently, but neither one will work for what your example was trying to do. In bash, the NUL bytes are stripped away entirely: arc3:/tmp/foo$ touch foo bar arc3:/tmp/foo$ echo "$(find . -print0)" ../foo./bar arc3:/tmp/foo$ echo "$(find . -print0)" | hd 00000000 2e 2e 2f 66 6f 6f 2e 2f 62 61 72 0a |../foo./bar.| 0000000c In ksh, the NUL bytes are retained, and thus you get the behavior I described above (stopping at the first one): arc3:/tmp/foo$ ksh -c 'echo "$(find . -print0)"' . Thus, $(find ...) is never going to be useful, in either shell, under any circumstance. It simply cannot produce useful output when operating on real filenames outside of controlled environments. If you want to work with find, you must throw away command substitution entirely. This is regrettable, because it would be extremely convenient if you could do something like vi `find . -name '*.c'`, but you simply can't do it. So, what does that leave you? * You can use -exec, or * You can read the output of find ... -print0 as a stream. You've already seen one example of -exec. When using -exec, the find command (which is external, not part of the shell) is told to execute yet another external command for each file that it matches (or for clumps of matched filenames, when using the newer + terminator). The disadvantage of -exec is that if you wanted to do something within your shell (putting the filenames into an array, incrementing a counter variable, etc.), you can't. You're already two processes removed from your shell. Likewise, you can't -exec a shell function that you wrote. You would have to use a separate script, or write out the shell code in quotes and call -exec sh -c '....'. If you want to work on filenames recursively within your script, you will almost always end up using the following idiom, because all the alternatives are ruled out one way or another: while IFS= read -r -d '' filename; do ... done < <(find ... -print0) This example uses two bash features (process substitution, and read -d '') so it's extremely non-portable. The obvious pipeline alternative (find ... -print0 | while read) is ruled out because the while read occurs in a subshell, and thus any variables set by the subshell are lost after the loop. The read -d '' is a special trick that tells bash's read command to stop at each NUL byte instead of each newline. The output of find is never put into a string in memory (as with command substitution), so the problems we had when trying to work with NULs in a command substitution don't apply here. For example, if we wanted to do vi `find . -name '*.c'` but actually have it WORK in the general case, we end up needing this monstrosity: unset array while IFS= read -r -d '' f; do array+=("$f"); done \ < <(find . -name '*.c' -print0) vi "${arr...@]}" ... which uses three bash extensions and one BSD/GNU extension. To the best of my knowledge, the task is completely impossible in strict POSIX. (You can work around the -print0 by using -exec printf '%s\0' {} + but then there's no way to read the NUL-delimited stream, and no arrays to put it into, as you cannot set positional parameters individually.)