date:20100920

Re: How to deal with space in command line?

2010-09-20 Thread Greg Wooledge

On Sat, Sep 18, 2010 at 09:16:46PM -0500, Peng Yu wrote:
> Hi,
> 
> stat --printf "%y %n\n" `find . -type f -print`

Chris and Pierre already helped with this specific example.  I'd like to
address the more general case.

In the original design of the Unix shell, in many ways and places,
it's quite apparent that the designers never really intended to handle
filenames that contain whitespace.  Things like your stat `find . -print`
example look like they ought to work, but they don't -- precisely because
the shell's word-splitting operates on whitespace, but filenames are
ALLOWED to contain whitespace.  There is an obstacle here, and there is
NO WAY to overcome it.  It's completely impossible.  The only solution
is to use an entirely different approach altogether.

Thus, the proposed alternatives such as find . -exec stat {} + which
Chris and Pierre have already provided.

To clarify the problem, when you write `...` or $(...) you produce a
single string which is the entire output all shoved together ("serialized"
is the fancy word for it).  The shell takes this single string and then
tries to break it apart into meaningful chunks (word splitting).

However, with serialized filenames, there is no way to tell where one
filename ends and the next begins.  If you see the string "foo bar",
you can't tell whether that's one file name with a space in the middle,
or two filenames with a space between them.

Likewise, newlines are allowed in filenames.  If you see the string
"foo\nbar" where \n is a newline, you can't tell whether it's one
filename or two filenames.

The only character that is NOT allowed in a Unix filename is NUL (ASCII 0).
So, if you have a serialized stream of filenames "foo\0bar\0" then you
know that there are two filenames, and the NUL (\0) bytes tell you where
they end.  That's wonderful if you're reading from a stream or a file.
But it doesn't help you with command substitution (`...` or $(...))
because you can't work with NUL bytes in the shell.  The command
substitution goes into a C string in memory.  When you try to read back
the contents of that C string, you stop at the first NUL, because that's
what NUL means in a C string -- "end of string".

Bash and ksh actually handle this differently, but neither one will work
for what your example was trying to do.  In bash, the NUL bytes are
stripped away entirely:

arc3:/tmp/foo$ touch foo bar
arc3:/tmp/foo$ echo "$(find . -print0)"
../foo./bar
arc3:/tmp/foo$ echo "$(find . -print0)" | hd
  2e 2e 2f 66 6f 6f 2e 2f  62 61 72 0a  |../foo./bar.|
000c

In ksh, the NUL bytes are retained, and thus you get the behavior I
described above (stopping at the first one):

arc3:/tmp/foo$ ksh -c 'echo "$(find . -print0)"'
.

Thus, $(find ...) is never going to be useful, in either shell, under
any circumstance.  It simply cannot produce useful output when operating
on real filenames outside of controlled environments.

If you want to work with find, you must throw away command substitution
entirely.  This is regrettable, because it would be extremely convenient
if you could do something like vi `find . -name '*.c'`, but you simply
can't do it.

So, what does that leave you?

 * You can use -exec, or
 * You can read the output of find ... -print0 as a stream.

You've already seen one example of -exec.  When using -exec, the find
command (which is external, not part of the shell) is told to execute
yet another external command for each file that it matches (or for
clumps of matched filenames, when using the newer + terminator).

The disadvantage of -exec is that if you wanted to do something within
your shell (putting the filenames into an array, incrementing a counter
variable, etc.), you can't.  You're already two processes removed from
your shell.  Likewise, you can't -exec a shell function that you wrote.
You would have to use a separate script, or write out the shell code in
quotes and call -exec sh -c ''.

If you want to work on filenames recursively within your script, you
will almost always end up using the following idiom, because all the
alternatives are ruled out one way or another:

while IFS= read -r -d '' filename; do
 ...
done < <(find ... -print0)

This example uses two bash features (process substitution, and read -d '')
so it's extremely non-portable.  The obvious pipeline alternative
(find ... -print0 | while read) is ruled out because the while read occurs
in a subshell, and thus any variables set by the subshell are lost after
the loop.  The read -d '' is a special trick that tells bash's read command
to stop at each NUL byte instead of each newline.  The output of find
is never put into a string in memory (as with command substitution), so
the problems we had when trying to work with NULs in a command substitution
don't apply here.

For example, if we wanted to do vi `find . -name '*.c'` but actually have
it WORK in the general case, we end up needing this monstrosity:

unset array
while IFS= read -r -d '' f; d

Re: How to deal with space in command line?

2010-09-20 Thread Eric Blake


On 09/20/2010 07:14 AM, Greg Wooledge wrote:

... which uses three bash extensions and one BSD/GNU extension.  To the
best of my knowledge, the task is completely impossible in strict POSIX.


Impossible in strict POSIX 2008.  But the Austin Group (the people that 
develop the POSIX standard) is actively working on proposals to enhance 
the next revision of POSIX that will make it easier to deal with awkward 
file names; the proposals on the floor include (among others): mandating 
support for $'...', requiring that compliant file systems reject \n in 
newly-created file names, and adding an environment variable to make it 
easier to detect when you are dealing with existing file systems with \n 
already in an existing file name.  Help in reviewing and contributing to 
these proposals will be most welcome.


--
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org

Re: How to deal with space in command line?

2010-09-20 Thread Marc Herbert

Le 20/09/2010 14:14, Greg Wooledge wrote:
> In the original design of the Unix shell, in many ways and places,
> it's quite apparent that the designers never really intended to handle
> filenames that contain whitespace.  

... while at the same time allowing almost any character to be part of
a filename. The irony.

"UNIX is user-friendly. It's just very selective about who its friends are."

"gitk &" closes parent bash upon exit

2010-09-20 Thread Illia Bobyr

  Configuration Information:
Machine: i686
OS: cygwin
Compiler: gcc-4
Compilation CFLAGS:  -DPROGRAM='bash.exe' -DCONF_HOSTTYPE='i686' 
-DCONF_OSTYPE='cygwin' -DCONF_MACHTYPE='i686-pc-cygwin' 
-DCONF_VENDOR='pc' -DLOCALEDIR='/usr/share/locale' -DPACKAGE='bash' 
-DSHELL -DHAVE_CONFIG_H -DRECYCLES_PIDS   -I.  
-I/usr/src/bash-3.2.51-24/src/bash-3.2 
-I/usr/src/bash-3.2.51-24/src/bash-3.2/include 
-I/usr/src/bash-3.2.51-24/src/bash-3.2/lib   -O2 -pipe
uname output: CYGWIN_NT-5.1 wron-ibobyr 1.7.7(0.230/5/3) 2010-08-31 
09:58 i686 Cygwin
Machine Type: i686-pc-cygwin

Bash Version: 3.2
Patch Level: 51
Release Status: release

Description:
 I see this problem only with gitk (that is a TCL script).  It 
appeared about a couple of month ago.  I update all the packages in my 
Cygwin installation almost weekly.  It is probably possible to figure 
out which bash and/or gitk update was the one that introduced the 
problem but I did not do it yet.
 When I do "gitk &" upon gitk exit the parent bash process 
terminates as well.
 When I do "(gitk &)" it works fine.  There does not seem to be any 
crash dumps.  But sometimes bash outputs "Logout" before it exits just 
as if I would press ^D on prompt.  I have tried putting "gitk &" call 
into a script and adding traps for all possible signals but none seemed 
to be fired.
 You do not have to be in a directory that is a Git repository.

 It may be a Cygwin specific problem.  Approximately at the time the
problem appeared cygwin.dll was also updated.

 I hope that maybe someone with more knowledge in the relevant areas
may suggest a simpler test case and/or a direction to search for a 
simpler test case as using whole gitk that is a big TCL script.  By the 
way "info patchlevel" on my Cygwin TCL says "8.4.1"

Repeat-By:
 1. gitk &
 2. If you are not in a Git repository just close the dialog by 
selecting "OK" otherwise close the gitk window.
 3. The parent bash exits.

Fix:
 Running gitk in a subshell works fine.

Re: RFE: request for quotes as grouping operators to work in brackets as elsewhere.

2010-09-20 Thread Linda Walsh




Pierre Gaston wrote:


Just quote the spaces and not the special chars:


Pierre, your suggestion doesn't help clean up strings used inside of double
brackets.  I wanted to avoid the need for multiple backslashes in an expression 
as it makes the expression less readable and more error prone.


Note that the same problem and solution exist when you use filename generation:

for f in /some path/ withspaces/*; do  # doesn't work the path contains spaces


I'm aware of that, but since [[ and ]] are new, and =~ is new, there
is no legal interpretation for multiple arguments on either side of the =~
operator.

Since =~ permits comparing variables _without_ putting quotes around them,
(as would normally be the case if you used the single square brackets and
plain '='), why not extend that idea to not needing quotes between
the =~ and either side of the double square brackets so literal strings 
benefit from not needing quotes as well.


Of course, if quotes *are* included on the rhs, then the pattern matching
(glob or regex) would be disabled as happens now.

Is there a downside to this syntax or this idea?

Re: How to deal with space in command line?

2010-09-20 Thread Chet Ramey

On 9/20/10 10:30 AM, Marc Herbert wrote:
> Le 20/09/2010 14:14, Greg Wooledge wrote:
>> In the original design of the Unix shell, in many ways and places,
>> it's quite apparent that the designers never really intended to handle
>> filenames that contain whitespace.  
> 
> ... while at the same time allowing almost any character to be part of
> a filename. The irony.

It's the difference between the possible and the desirable, or, more
to the point, between the possible and the common and convenient.

Unix, especially in the early days, was all about the 90% solution.

Chet

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, ITS, CWRUc...@case.eduhttp://cnswww.cns.cwru.edu/~chet/

Re: How to deal with space in command line?

2010-09-20 Thread Linda Walsh




Peng Yu wrote:

Hi,

stat --printf "%y %n\n" `find . -type f -print`

I could use the following trick to stat each file separately. But I
prefer to stat all the files at once. I'm wondering if there is any
easy way to converted the strings returned by find if there are
special characters such as space by adding '\' in front them?

http://www.cyberciti.biz/tips/handling-filenames-with-spaces-in-bash.html



-
Does your situation require performing all the stats
in one invocation?

Is there a reason you couldn't use null-terminated filenames?  They
were designed specifically for this purpose (to quote all other characters,
as nulls are illegal in filenames):

find . -type f -print0 |xargs -0 stat -printf "%y %n\n"

Re: "gitk &" closes parent bash upon exit

2010-09-20 Thread Eric Blake


On 09/20/2010 12:44 PM, Illia Bobyr wrote:

  It may be a Cygwin specific problem.  Approximately at the time the
problem appeared cygwin.dll was also updated.


This is a known cygwin problem, caused by the fact that cygwin tcl is 
not cygwin-aware, which makes cygwin have a tough time knowinghow to 
manage controlling ttys across a parent and grandchild process with a 
non-cygwin process in the middle:

http://cygwin.com/ml/cygwin/2010-09/msg00641.html

Bash may yet have a bug where it over-reacts to a failed tty ioctl, by 
exiting instead of reporting the problem, but if that turns out to be 
the case, I will follow up here with more details.


--
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org

Re: RFE: request for quotes as grouping operators to work in brackets as elsewhere.

2010-09-20 Thread Pierre Gaston

On Mon, Sep 20, 2010 at 10:28 PM, Linda Walsh  wrote:
>
>
> Pierre Gaston wrote:
>
>> Just quote the spaces and not the special chars:
>
> Pierre, your suggestion doesn't help clean up strings used inside of double
> brackets.  I wanted to avoid the need for multiple backslashes in an
> expression as it makes the expression less readable and more error prone.

Multiple backslash? I gave only one example with one backslash and gave several
without one, and even a solution where the regexp is inside quotes
like you initially
requested.

>> Note that the same problem and solution exist when you use filename
>> generation:
>>
>> for f in /some path/ withspaces/*; do  # doesn't work the path contains
>> spaces
>
> 
>        I'm aware of that, but since [[ and ]] are new, and =~ is new, there
> is no legal interpretation for multiple arguments on either side of the =~
> operator.
>
> Since =~ permits comparing variables _without_ putting quotes around them,
> (as would normally be the case if you used the single square brackets and
> plain '='), why not extend that idea to not needing quotes between
> the =~ and either side of the double square brackets so literal strings
> benefit from not needing quotes as well.
> Of course, if quotes *are* included on the rhs, then the pattern matchings
> (glob or regex) would be disabled as happens now.
>
> Is there a downside to this syntax or this idea?
>
>
Besides introducing yet another parsing exception, while the actual
problem and solution probably exist
for as long as the bourne shell exist (and maybe before), what about:

[[ foo =~ bar && baz ]]

Should bar && baz be considered as one regexp? if not, how would you
write a regexp matching
`foo && baz' ? or `foo && baz.*' ? if yes how would you do and and
with a regexp?

What if you want to match `  bar && baz   ' with trailing or leading spaces?
Should you be able to also use space without quotes in this case and have
[[ foo =~ bar ]] and [[ foo =~   bar   ]] have different meanings?

Space are used to separates arguments everywhere in the shell and yes
quotes are sometimes
ugly and often causes trouble until you take the time to learn to use
them, but it's the price to
pay to avoid putting quotes around every argument every time you use
the command line interactively.

I don't see how your suggestion would help in the end since you would
still need to quotes some chars like && or || and the handling of
space would not be consistent with the rest of the shell.

pwd does not update when path component is renamed

2010-09-20 Thread Krzysztof Żelechowski

Configuration Information [Automatically generated, do not change]:
Machine: x86_64
OS: linux-gnu
Compiler: gcc -I/usr/src/packages/BUILD/bash-4.1 
-L/usr/src/packages/BUILD/bash-4.1/../readline-6.1
Compilation CFLAGS:  -DPROGRAM='bash' -DCONF_HOSTTYPE='x86_64' 
-DCONF_OSTYPE='linux-gnu' -DCONF_MACHTYPE='x86_64-suse-linux-gnu' 
-DCONF_VENDOR='suse' -DLOCALEDIR='/usr/share/locale' -DPACKAGE='bash' -DSHELL 
-DHAVE_CONFIG_H   -I.  -I. -I./include -I./lib   -fmessage-length=0 -O2 -Wall 
-D_FORTIFY_SOURCE=2 -fstack-protector -funwind-tables 
-fasynchronous-unwind-tables -g  -D_GNU_SOURCE -DRECYCLES_PIDS -Wall -g 
-std=gnu89 -Wuninitialized -Wextra -Wno-unprototyped-calls -Wno-switch-enum 
-Wno-unused-variable -Wno-unused-parameter -ftree-loop-linear -pipe 
-fprofile-use
uname output: Linux ne-1 2.6.34.7-0.2-desktop #1 SMP PREEMPT 2010-09-14 
14:21:06 +0200 x86_64 x86_64 x86_64 GNU/Linux
Machine Type: x86_64-suse-linux-gnu

Bash Version: 4.1
Patch Level: 7
Release Status: release

Description:
The text of pwd and the value of $PWD return a cached value, regardless 
of the actual current path.

Repeat-By:
mkdir '-p' 'a' && cd 'a' && mv '../a' '../b' && enable '-n' 'pwd' && 
builtin 'pwd' && pwd

Fix:
cd '-P' '.'

Re: pwd does not update when path component is renamed

2010-09-20 Thread Jan Schampera


Krzysztof Zelechowski wrote:


Description:
The text of pwd and the value of $PWD return a cached value, regardless 
of the actual current path.

Repeat-By:
mkdir '-p' 'a' && cd 'a' && mv '../a' '../b' && enable '-n' 'pwd' && builtin 
'pwd' && pwd

Fix:
cd '-P' '.'



I think it's the same mechanism that catches symlinked directory names, 
i.e. the shell has its own "view" to the filesystem.


For symlinked directories this is not a bug.

For this case, I don't think there's a reliable and portable way to 
catch it. The open directory is valid (since it's open) for the shell 
process, but $PWD given to other programs will make them fail. But i 
don't think a getcwd() after every command or every now and then is 
efficient.


The"no solution provided"Bonsai

Re: pwd does not update when path component is renamed

2010-09-20 Thread Bob Proulx

Krzysztof Żelechowski wrote:
> The text of pwd and the value of $PWD return a cached value,
> regardless of the actual current path.
> mkdir '-p' 'a' && cd 'a' && mv '../a' '../b' && enable '-n' 'pwd' && 
> builtin 'pwd' && pwd
> 
> Fix:
> cd '-P' '.'

That is just the way that things are.  The logical path used to get to
someplace isn't canonical.  There may be multiple logical paths that
point to a location.  You can only cache the logical value.  If the
physical path changes out from under the cached value then they will
be out of sync and there isn't any way to avoid it.  And your fix of
switching to the physical path isn't appropriate when a user is
requesting logical paths.

This is a user configurable setting.  If you want canonical paths that
are always correct you should use physical paths.  The ~/.bashrc file
would be an appropriate place to place that setting.

  set -o physical

Bob

Re: RFE: request for quotes as grouping operators to work in brackets as elsewhere.

2010-09-20 Thread Linda Walsh

Pierre Gaston wrote:
what about:

[[ foo =~ bar && baz ]]

Should bar && baz be considered as one regexp? if not, how would you
write a regexp matching
`foo && baz' ? or `foo && baz.*' ?

Use parentheses to disambiguate ambiguous cases?

if yes how would you do and and

with a regexp?

What if you want to match ` bar && baz ' with trailing or leading spaces?

You'd be no worse off than you are now -- you'd have to use backslash
or some other quoting mechanism. In my initial query on this issue, I
had [[ 'var' =~ multi word pattern ]]. I was only considering the case
where multiple words would generate a syntax error, currently, I hadn't
thought about the multi operator flavor.

Sounds like a simple rule might be to include words in the matching
string as long as they would not be ambiguous (would be a syntax error).

There's always plan B, but I was sorta resisting that...

Space are used to separates arguments everywhere in the shell and yes
quotes are sometimes
ugly and often causes trouble until you take the time to learn to use
them, but it's the price to
pay to avoid putting quotes around every argument every time you use
the command line interactively

---
The problem here, is there's no simple grouping operator that can go around regex's (and for that matter, globex's), that allow pattern matching (and expansion) within the grouping operator.

I don't see how your suggestion would help in the end since you would
still need to quotes some chars like && or || and the handling of
space would not be consistent with the rest of the shell.

Unless you stopped the grouping when adding the next term has a
legal interpretation. The point of allowing multiple terms to be grouped
as one expression was in the case where it would otherwise be interpreted
as an error -- if there's a legal interpretation, as in your example above, then the rule would have to be that any currently legal interpretation
remains that way and adding further terms wouldn't happen.

Only in the case that adding further 'words' to the matching
expression would currently be illegal (generate some error), would
grouping occur -- that way there would be no backward compatibility
issues in currently working code.

Plan B -- use some other quoting character to group the
expression other than single or double quote. There is one other type
of quote that I know of that is sufficiently visually different from
current symbols as to not be easily confused with any current
operator -- the « double angular » quotation marks, (U+00AB, U+00BB).
I resist that idea only because my keyboard doesn't easily allow
me to type them. I suppose a [somewhat lame] substitute would be to
allow a multi-byte sequence like \<< \>> to be equivalent to the actual
Unicode double-angular characters for use where the Unicode values

couldn't be used or were just too inconvenient.

Since they are used in multiple latin-alphabet based languages
(french, spanish, swedish, et al), they shouldn't be too rare to be
found in western-language alphabet fonts. The only other candidate, the
〝Double Prime〞 Quotation Marks (U+301D, U+301E), are in the CJK Punctuation
range, and would be less likely to be found in western-language fonts.

Or, you could use a pair of a character like "/" (slash)
around the expression as several other regex engines default to.

If having a special case for multiple words on either side of
=~, **that would otherwise be illegal**, seems odd -- I'd like to point
out inconsistent use of double quotes turning off expansion and of
grouping, as in "$@" and "$*", respectively.

Certainly the best option (under existing compatibility
constraints) might be to allow both -- multi-word grouping of otherwise
illegal terms, AND addition of the double-angular quote, that would
group an expression together but would still allow pattern substitution,
or matching, between them.

-l

Re: How to deal with space in command line?

Re: How to deal with space in command line?

Re: How to deal with space in command line?

"gitk &" closes parent bash upon exit

Re: RFE: request for quotes as grouping operators to work in brackets as elsewhere.

Re: How to deal with space in command line?

Re: How to deal with space in command line?

Re: "gitk &" closes parent bash upon exit

Re: RFE: request for quotes as grouping operators to work in brackets as elsewhere.

pwd does not update when path component is renamed

Re: pwd does not update when path component is renamed

Re: pwd does not update when path component is renamed

Re: RFE: request for quotes as grouping operators to work in brackets as elsewhere.

13 matches

Site Navigation

Mail list logo

Footer information