Re: How to deal with space in command line?
On Sat, Sep 18, 2010 at 09:16:46PM -0500, Peng Yu wrote: > Hi, > > stat --printf "%y %n\n" `find . -type f -print` Chris and Pierre already helped with this specific example. I'd like to address the more general case. In the original design of the Unix shell, in many ways and places, it's quite apparent that the designers never really intended to handle filenames that contain whitespace. Things like your stat `find . -print` example look like they ought to work, but they don't -- precisely because the shell's word-splitting operates on whitespace, but filenames are ALLOWED to contain whitespace. There is an obstacle here, and there is NO WAY to overcome it. It's completely impossible. The only solution is to use an entirely different approach altogether. Thus, the proposed alternatives such as find . -exec stat {} + which Chris and Pierre have already provided. To clarify the problem, when you write `...` or $(...) you produce a single string which is the entire output all shoved together ("serialized" is the fancy word for it). The shell takes this single string and then tries to break it apart into meaningful chunks (word splitting). However, with serialized filenames, there is no way to tell where one filename ends and the next begins. If you see the string "foo bar", you can't tell whether that's one file name with a space in the middle, or two filenames with a space between them. Likewise, newlines are allowed in filenames. If you see the string "foo\nbar" where \n is a newline, you can't tell whether it's one filename or two filenames. The only character that is NOT allowed in a Unix filename is NUL (ASCII 0). So, if you have a serialized stream of filenames "foo\0bar\0" then you know that there are two filenames, and the NUL (\0) bytes tell you where they end. That's wonderful if you're reading from a stream or a file. But it doesn't help you with command substitution (`...` or $(...)) because you can't work with NUL bytes in the shell. The command substitution goes into a C string in memory. When you try to read back the contents of that C string, you stop at the first NUL, because that's what NUL means in a C string -- "end of string". Bash and ksh actually handle this differently, but neither one will work for what your example was trying to do. In bash, the NUL bytes are stripped away entirely: arc3:/tmp/foo$ touch foo bar arc3:/tmp/foo$ echo "$(find . -print0)" ../foo./bar arc3:/tmp/foo$ echo "$(find . -print0)" | hd 2e 2e 2f 66 6f 6f 2e 2f 62 61 72 0a |../foo./bar.| 000c In ksh, the NUL bytes are retained, and thus you get the behavior I described above (stopping at the first one): arc3:/tmp/foo$ ksh -c 'echo "$(find . -print0)"' . Thus, $(find ...) is never going to be useful, in either shell, under any circumstance. It simply cannot produce useful output when operating on real filenames outside of controlled environments. If you want to work with find, you must throw away command substitution entirely. This is regrettable, because it would be extremely convenient if you could do something like vi `find . -name '*.c'`, but you simply can't do it. So, what does that leave you? * You can use -exec, or * You can read the output of find ... -print0 as a stream. You've already seen one example of -exec. When using -exec, the find command (which is external, not part of the shell) is told to execute yet another external command for each file that it matches (or for clumps of matched filenames, when using the newer + terminator). The disadvantage of -exec is that if you wanted to do something within your shell (putting the filenames into an array, incrementing a counter variable, etc.), you can't. You're already two processes removed from your shell. Likewise, you can't -exec a shell function that you wrote. You would have to use a separate script, or write out the shell code in quotes and call -exec sh -c ''. If you want to work on filenames recursively within your script, you will almost always end up using the following idiom, because all the alternatives are ruled out one way or another: while IFS= read -r -d '' filename; do ... done < <(find ... -print0) This example uses two bash features (process substitution, and read -d '') so it's extremely non-portable. The obvious pipeline alternative (find ... -print0 | while read) is ruled out because the while read occurs in a subshell, and thus any variables set by the subshell are lost after the loop. The read -d '' is a special trick that tells bash's read command to stop at each NUL byte instead of each newline. The output of find is never put into a string in memory (as with command substitution), so the problems we had when trying to work with NULs in a command substitution don't apply here. For example, if we wanted to do vi `find . -name '*.c'` but actually have it WORK in the general case, we end up needing this monstrosity: unset array while IFS= read -r -d '' f; d
Re: How to deal with space in command line?
On 09/20/2010 07:14 AM, Greg Wooledge wrote: ... which uses three bash extensions and one BSD/GNU extension. To the best of my knowledge, the task is completely impossible in strict POSIX. Impossible in strict POSIX 2008. But the Austin Group (the people that develop the POSIX standard) is actively working on proposals to enhance the next revision of POSIX that will make it easier to deal with awkward file names; the proposals on the floor include (among others): mandating support for $'...', requiring that compliant file systems reject \n in newly-created file names, and adding an environment variable to make it easier to detect when you are dealing with existing file systems with \n already in an existing file name. Help in reviewing and contributing to these proposals will be most welcome. -- Eric Blake ebl...@redhat.com+1-801-349-2682 Libvirt virtualization library http://libvirt.org
Re: How to deal with space in command line?
Le 20/09/2010 14:14, Greg Wooledge wrote: > In the original design of the Unix shell, in many ways and places, > it's quite apparent that the designers never really intended to handle > filenames that contain whitespace. ... while at the same time allowing almost any character to be part of a filename. The irony. "UNIX is user-friendly. It's just very selective about who its friends are."
"gitk &" closes parent bash upon exit
Configuration Information: Machine: i686 OS: cygwin Compiler: gcc-4 Compilation CFLAGS: -DPROGRAM='bash.exe' -DCONF_HOSTTYPE='i686' -DCONF_OSTYPE='cygwin' -DCONF_MACHTYPE='i686-pc-cygwin' -DCONF_VENDOR='pc' -DLOCALEDIR='/usr/share/locale' -DPACKAGE='bash' -DSHELL -DHAVE_CONFIG_H -DRECYCLES_PIDS -I. -I/usr/src/bash-3.2.51-24/src/bash-3.2 -I/usr/src/bash-3.2.51-24/src/bash-3.2/include -I/usr/src/bash-3.2.51-24/src/bash-3.2/lib -O2 -pipe uname output: CYGWIN_NT-5.1 wron-ibobyr 1.7.7(0.230/5/3) 2010-08-31 09:58 i686 Cygwin Machine Type: i686-pc-cygwin Bash Version: 3.2 Patch Level: 51 Release Status: release Description: I see this problem only with gitk (that is a TCL script). It appeared about a couple of month ago. I update all the packages in my Cygwin installation almost weekly. It is probably possible to figure out which bash and/or gitk update was the one that introduced the problem but I did not do it yet. When I do "gitk &" upon gitk exit the parent bash process terminates as well. When I do "(gitk &)" it works fine. There does not seem to be any crash dumps. But sometimes bash outputs "Logout" before it exits just as if I would press ^D on prompt. I have tried putting "gitk &" call into a script and adding traps for all possible signals but none seemed to be fired. You do not have to be in a directory that is a Git repository. It may be a Cygwin specific problem. Approximately at the time the problem appeared cygwin.dll was also updated. I hope that maybe someone with more knowledge in the relevant areas may suggest a simpler test case and/or a direction to search for a simpler test case as using whole gitk that is a big TCL script. By the way "info patchlevel" on my Cygwin TCL says "8.4.1" Repeat-By: 1. gitk & 2. If you are not in a Git repository just close the dialog by selecting "OK" otherwise close the gitk window. 3. The parent bash exits. Fix: Running gitk in a subshell works fine.
Re: RFE: request for quotes as grouping operators to work in brackets as elsewhere.
Pierre Gaston wrote: Just quote the spaces and not the special chars: Pierre, your suggestion doesn't help clean up strings used inside of double brackets. I wanted to avoid the need for multiple backslashes in an expression as it makes the expression less readable and more error prone. Note that the same problem and solution exist when you use filename generation: for f in /some path/ withspaces/*; do # doesn't work the path contains spaces I'm aware of that, but since [[ and ]] are new, and =~ is new, there is no legal interpretation for multiple arguments on either side of the =~ operator. Since =~ permits comparing variables _without_ putting quotes around them, (as would normally be the case if you used the single square brackets and plain '='), why not extend that idea to not needing quotes between the =~ and either side of the double square brackets so literal strings benefit from not needing quotes as well. Of course, if quotes *are* included on the rhs, then the pattern matching (glob or regex) would be disabled as happens now. Is there a downside to this syntax or this idea?
Re: How to deal with space in command line?
On 9/20/10 10:30 AM, Marc Herbert wrote: > Le 20/09/2010 14:14, Greg Wooledge wrote: >> In the original design of the Unix shell, in many ways and places, >> it's quite apparent that the designers never really intended to handle >> filenames that contain whitespace. > > ... while at the same time allowing almost any character to be part of > a filename. The irony. It's the difference between the possible and the desirable, or, more to the point, between the possible and the common and convenient. Unix, especially in the early days, was all about the 90% solution. Chet -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, ITS, CWRUc...@case.eduhttp://cnswww.cns.cwru.edu/~chet/
Re: How to deal with space in command line?
Peng Yu wrote: Hi, stat --printf "%y %n\n" `find . -type f -print` I could use the following trick to stat each file separately. But I prefer to stat all the files at once. I'm wondering if there is any easy way to converted the strings returned by find if there are special characters such as space by adding '\' in front them? http://www.cyberciti.biz/tips/handling-filenames-with-spaces-in-bash.html - Does your situation require performing all the stats in one invocation? Is there a reason you couldn't use null-terminated filenames? They were designed specifically for this purpose (to quote all other characters, as nulls are illegal in filenames): find . -type f -print0 |xargs -0 stat -printf "%y %n\n"
Re: "gitk &" closes parent bash upon exit
On 09/20/2010 12:44 PM, Illia Bobyr wrote: It may be a Cygwin specific problem. Approximately at the time the problem appeared cygwin.dll was also updated. This is a known cygwin problem, caused by the fact that cygwin tcl is not cygwin-aware, which makes cygwin have a tough time knowinghow to manage controlling ttys across a parent and grandchild process with a non-cygwin process in the middle: http://cygwin.com/ml/cygwin/2010-09/msg00641.html Bash may yet have a bug where it over-reacts to a failed tty ioctl, by exiting instead of reporting the problem, but if that turns out to be the case, I will follow up here with more details. -- Eric Blake ebl...@redhat.com+1-801-349-2682 Libvirt virtualization library http://libvirt.org
Re: RFE: request for quotes as grouping operators to work in brackets as elsewhere.
On Mon, Sep 20, 2010 at 10:28 PM, Linda Walsh wrote: > > > Pierre Gaston wrote: > >> Just quote the spaces and not the special chars: > > Pierre, your suggestion doesn't help clean up strings used inside of double > brackets. I wanted to avoid the need for multiple backslashes in an > expression as it makes the expression less readable and more error prone. Multiple backslash? I gave only one example with one backslash and gave several without one, and even a solution where the regexp is inside quotes like you initially requested. >> Note that the same problem and solution exist when you use filename >> generation: >> >> for f in /some path/ withspaces/*; do # doesn't work the path contains >> spaces > > > I'm aware of that, but since [[ and ]] are new, and =~ is new, there > is no legal interpretation for multiple arguments on either side of the =~ > operator. > > Since =~ permits comparing variables _without_ putting quotes around them, > (as would normally be the case if you used the single square brackets and > plain '='), why not extend that idea to not needing quotes between > the =~ and either side of the double square brackets so literal strings > benefit from not needing quotes as well. > Of course, if quotes *are* included on the rhs, then the pattern matchings > (glob or regex) would be disabled as happens now. > > Is there a downside to this syntax or this idea? > > Besides introducing yet another parsing exception, while the actual problem and solution probably exist for as long as the bourne shell exist (and maybe before), what about: [[ foo =~ bar && baz ]] Should bar && baz be considered as one regexp? if not, how would you write a regexp matching `foo && baz' ? or `foo && baz.*' ? if yes how would you do and and with a regexp? What if you want to match ` bar && baz ' with trailing or leading spaces? Should you be able to also use space without quotes in this case and have [[ foo =~ bar ]] and [[ foo =~ bar ]] have different meanings? Space are used to separates arguments everywhere in the shell and yes quotes are sometimes ugly and often causes trouble until you take the time to learn to use them, but it's the price to pay to avoid putting quotes around every argument every time you use the command line interactively. I don't see how your suggestion would help in the end since you would still need to quotes some chars like && or || and the handling of space would not be consistent with the rest of the shell.
pwd does not update when path component is renamed
Configuration Information [Automatically generated, do not change]: Machine: x86_64 OS: linux-gnu Compiler: gcc -I/usr/src/packages/BUILD/bash-4.1 -L/usr/src/packages/BUILD/bash-4.1/../readline-6.1 Compilation CFLAGS: -DPROGRAM='bash' -DCONF_HOSTTYPE='x86_64' -DCONF_OSTYPE='linux-gnu' -DCONF_MACHTYPE='x86_64-suse-linux-gnu' -DCONF_VENDOR='suse' -DLOCALEDIR='/usr/share/locale' -DPACKAGE='bash' -DSHELL -DHAVE_CONFIG_H -I. -I. -I./include -I./lib -fmessage-length=0 -O2 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector -funwind-tables -fasynchronous-unwind-tables -g -D_GNU_SOURCE -DRECYCLES_PIDS -Wall -g -std=gnu89 -Wuninitialized -Wextra -Wno-unprototyped-calls -Wno-switch-enum -Wno-unused-variable -Wno-unused-parameter -ftree-loop-linear -pipe -fprofile-use uname output: Linux ne-1 2.6.34.7-0.2-desktop #1 SMP PREEMPT 2010-09-14 14:21:06 +0200 x86_64 x86_64 x86_64 GNU/Linux Machine Type: x86_64-suse-linux-gnu Bash Version: 4.1 Patch Level: 7 Release Status: release Description: The text of pwd and the value of $PWD return a cached value, regardless of the actual current path. Repeat-By: mkdir '-p' 'a' && cd 'a' && mv '../a' '../b' && enable '-n' 'pwd' && builtin 'pwd' && pwd Fix: cd '-P' '.'
Re: pwd does not update when path component is renamed
Krzysztof Zelechowski wrote: Description: The text of pwd and the value of $PWD return a cached value, regardless of the actual current path. Repeat-By: mkdir '-p' 'a' && cd 'a' && mv '../a' '../b' && enable '-n' 'pwd' && builtin 'pwd' && pwd Fix: cd '-P' '.' I think it's the same mechanism that catches symlinked directory names, i.e. the shell has its own "view" to the filesystem. For symlinked directories this is not a bug. For this case, I don't think there's a reliable and portable way to catch it. The open directory is valid (since it's open) for the shell process, but $PWD given to other programs will make them fail. But i don't think a getcwd() after every command or every now and then is efficient. The"no solution provided"Bonsai
Re: pwd does not update when path component is renamed
Krzysztof Żelechowski wrote: > The text of pwd and the value of $PWD return a cached value, > regardless of the actual current path. > mkdir '-p' 'a' && cd 'a' && mv '../a' '../b' && enable '-n' 'pwd' && > builtin 'pwd' && pwd > > Fix: > cd '-P' '.' That is just the way that things are. The logical path used to get to someplace isn't canonical. There may be multiple logical paths that point to a location. You can only cache the logical value. If the physical path changes out from under the cached value then they will be out of sync and there isn't any way to avoid it. And your fix of switching to the physical path isn't appropriate when a user is requesting logical paths. This is a user configurable setting. If you want canonical paths that are always correct you should use physical paths. The ~/.bashrc file would be an appropriate place to place that setting. set -o physical Bob
Re: RFE: request for quotes as grouping operators to work in brackets as elsewhere.
Pierre Gaston wrote: what about: [[ foo =~ bar && baz ]] Should bar && baz be considered as one regexp? if not, how would you write a regexp matching `foo && baz' ? or `foo && baz.*' ? Use parentheses to disambiguate ambiguous cases? if yes how would you do and and with a regexp? What if you want to match ` bar && baz ' with trailing or leading spaces? You'd be no worse off than you are now -- you'd have to use backslash or some other quoting mechanism. In my initial query on this issue, I had [[ 'var' =~ multi word pattern ]]. I was only considering the case where multiple words would generate a syntax error, currently, I hadn't thought about the multi operator flavor. Sounds like a simple rule might be to include words in the matching string as long as they would not be ambiguous (would be a syntax error). There's always plan B, but I was sorta resisting that... Space are used to separates arguments everywhere in the shell and yes quotes are sometimes ugly and often causes trouble until you take the time to learn to use them, but it's the price to pay to avoid putting quotes around every argument every time you use the command line interactively --- The problem here, is there's no simple grouping operator that can go around regex's (and for that matter, globex's), that allow pattern matching (and expansion) within the grouping operator. I don't see how your suggestion would help in the end since you would still need to quotes some chars like && or || and the handling of space would not be consistent with the rest of the shell. Unless you stopped the grouping when adding the next term has a legal interpretation. The point of allowing multiple terms to be grouped as one expression was in the case where it would otherwise be interpreted as an error -- if there's a legal interpretation, as in your example above, then the rule would have to be that any currently legal interpretation remains that way and adding further terms wouldn't happen. Only in the case that adding further 'words' to the matching expression would currently be illegal (generate some error), would grouping occur -- that way there would be no backward compatibility issues in currently working code. Plan B -- use some other quoting character to group the expression other than single or double quote. There is one other type of quote that I know of that is sufficiently visually different from current symbols as to not be easily confused with any current operator -- the « double angular » quotation marks, (U+00AB, U+00BB). I resist that idea only because my keyboard doesn't easily allow me to type them. I suppose a [somewhat lame] substitute would be to allow a multi-byte sequence like \<< \>> to be equivalent to the actual Unicode double-angular characters for use where the Unicode values couldn't be used or were just too inconvenient. Since they are used in multiple latin-alphabet based languages (french, spanish, swedish, et al), they shouldn't be too rare to be found in western-language alphabet fonts. The only other candidate, the 〝Double Prime〞 Quotation Marks (U+301D, U+301E), are in the CJK Punctuation range, and would be less likely to be found in western-language fonts. Or, you could use a pair of a character like "/" (slash) around the expression as several other regex engines default to. If having a special case for multiple words on either side of =~, **that would otherwise be illegal**, seems odd -- I'd like to point out inconsistent use of double quotes turning off expansion and of grouping, as in "$@" and "$*", respectively. Certainly the best option (under existing compatibility constraints) might be to allow both -- multi-word grouping of otherwise illegal terms, AND addition of the double-angular quote, that would group an expression together but would still allow pattern substitution, or matching, between them. -l