compgen is slow for large numbers of options

2012-03-14 Thread Richard Neill

Dear All,

I don't know for certain if this is a bug per se, but I think
"compgen -W" is much slower than it "should" be in the case of a large 
(1+) number of options.


For example (on a fast i7 2700 CPU), I measure:

compgen -W "`seq 1 5`" 1794#3.83 s
compgen -W "`seq 1 5 | grep 1794`"   #0.019 s

In these examples, I'm using `seq` as a trivial way to generate some 
data, and picking 1794 as a totally arbitrary match.


In the first example, compgen is doing the filtering, whereas in the 
2nd, I obtain the same result very much faster with grep.


If I increase the upper number by a factor of 10, to 50, these times 
become,  436 s (yes, really, 7 minutes!) and 0.20 s respectively.  This 
suggests that the algorithm used by compgen is O(n^2) whereas the 
algorithm used by grep is 0(1).



For a real world example, see:
  https://bugs.mageia.org/show_bug.cgi?id=373#c8
in which we are using completion on package-management.
In this case, the number is 43031.


I hope this is helpful.

Richard



Re: compgen is slow for large numbers of options

2012-03-14 Thread Richard Neill



If I increase the upper number by a factor of 10, to 50, these times
become, 436 s (yes, really, 7 minutes!) and 0.20 s respectively. This
suggests that the algorithm used by compgen is O(n^2) whereas the
algorithm used by grep is 0(1).


I meant: grep is O(n).



Re: compgen is slow for large numbers of options

2012-03-15 Thread Richard Neill

Dear Bob,

Thanks for your explanation. I do understand what is going on and why. 
But my point was that compgen has an implicit internal "grep" that is 
much less efficient than actual grep. Why is the performance of 
compgen's sorting/filtering algorithm so much worse than grep's ?

Both of them start with a large list, and filter it to a small one.

At any rate, might I suggest this is worthy of a line or two of 
documentation in the compgen manual? 40,000 possible completions is not 
necessarily unreasonable. [In this case, urpmi, as used by 
Mandriva/Mageia has had it wrong for some time.]


Best wishes,

Richard



Message: 4
Date: Wed, 14 Mar 2012 13:40:36 -0600
From: Bob Proulx
To: bug-bash@gnu.org
Subject: Re: compgen is slow for large numbers of options
Message-ID:<20120314194036.ga12...@hysteria.proulx.com>
Content-Type: text/plain; charset=us-ascii

Richard Neill wrote:

I don't know for certain if this is a bug per se, but I think
"compgen -W" is much slower than it "should" be in the case of a
large (1+) number of options.


I don't think this is a bug but just simply a misunderstanding of how
much memory must be allocated in order to generate `seq 1 50`.


For example (on a fast i7 2700 CPU), I measure:

compgen -W "`seq 1 5`" 1794#3.83 s
compgen -W "`seq 1 5 | grep 1794`"   #0.019 s

In these examples, I'm using `seq` as a trivial way to generate some
data, and picking 1794 as a totally arbitrary match.


Right.  But in the first case you are generating 288,894 bytes of data
and in the second case 89 bytes of data.  That is a large difference.

You could probably speed it up a small amount more by using grep -x -F
and avoid the common substring matches.  And perhaps more depending
upon your environment with LC_ALL=C to avoid charset issues.


In the first example, compgen is doing the filtering, whereas in the
2nd, I obtain the same result very much faster with grep.


Yes, since grep is reducing the argument size to 89 bytes.  That makes
perfect sense to me.  Anything that processes 89 bytes is going to be
much faster than anything that processes 288,894 bytes.


If I increase the upper number by a factor of 10, to 50, these
times become,  436 s (yes, really, 7 minutes!) and 0.20 s


That is an increase in argument size to 3,388,895 bytes and there will
be associated memory overhead to all of that increasing the size on
top of that value too.  Three plus megabytes of memory just for the
creation of the argument list and then it must be processed.  That
isn't going to be fast.  You would do much better if you filtered that
list down to something reasonable.


respectively.  This suggests that the algorithm used by compgen is
O(n^2) whereas the algorithm used by grep is 0(1).


On the counter it suggests that the algorithm you are using, one of
fully allocating all memory, is inefficient.  Whereas using grep as a
filter to reduce that memory to 89 bytes is of course more efficient.

I wrote a response on a similar issue previously.  Instead of posting
it again let me post a pointer to the previous message.

   http://lists.gnu.org/archive/html/bug-bash/2011-11/msg00189.html

Bob





bug-report/request: allow "set +n" to re-enable commands.

2013-03-22 Thread Richard Neill

Dear All,

Might I suggest/request that "set +n" should undo the effect of
"set -n"  ?

For example:

#!/bin/bash
echo one
set -n
echo two
set +n
echo three

would print:
one
three


Here's why I think it would be useful:


1. Bash doesn't have a block-comment mechanism, like /* ... */
and this would provide one.

2. The documentation for "help set" says that flags can be undone with 
"+",   thus the inverse of -n is +n.
(though in contradiction , it also says that subsequent commands (which 
would include the "set" are ignored)



3. It would allow for a neat hack with polyglots. For example:

#!/bin/bash -n



(there is actually a serious purpose to this, namely to create a php-cli 
script that shows all errors, despite the settings in the system-wide 
php-ini file)


Example 3 works if you remove the "-n" and "set +n" parts, though it 
then emits an annoying complaint about "?php: No such file or directory"



Thank you for your consideration,

Best wishes,

Richard



Bash - various feature requests

2006-12-29 Thread Richard Neill

Dear All,

I hope I'm posting this in the right place, since it it's really a 
feature request, rather than a bug report. However, here are a few 
things which I think would be nice to have in bash, and easy to implement:


1)substr support for a negative length argument.
For example,
  stringZ=abcdef
  echo ${stringZ:2:-1}  #prints cde

i.e. ${string:x:y}
  returns the string, from start position x for y characters.
  but, if x is negative, start from the right hand side
  and if y is negative, print up to the end, -y.

This would work the same way as PHP, and be extremely useful for, say, 
removing an extension from a filename.

http://uk.php.net/manual/en/function.substr.php

---

2)"help strings" and "help operators"

Strings and operators are all documented in the man page. Likewise, 
there is an excellent guide here:

  http://www.die.net/doc/linux/abs-guide/string-manipulation.html

However, a quick reminder from within the shell (in the same way that 
'for' and ':' are documented) would be really useful.



---

3)Some way to canonicalise a path. For example, converting:

  ./foo//bar/../baz.txt

to /home/foo/baz.txt


This isn't easy to do from a shell script. Maybe something like this 
could be a built in?

http://publicobject.com/2006/06/canonical-path-of-file-in-bash.html

There is a also a realpath program, but it seems not to be part of bash 
by default:

http://src.opensolaris.org/source/xref/sfw/usr/src/cmd/bash/bash-3.0/examples/loadables/realpath.c

---


4)A way to find a relative directory path. i.e. the solution to:

   If I were to start in directory  /home/me/x/y/z
   and I wanted to get to /home/me/x/a/b/c

   then, what argument would I need to provide to 'cd' ?

   #Answer is ../../b/c
   How could a shell script calculate this?

---

5)An enhancement to read/readline, such that one can specify the initial 
value with which the buffer is filled.


Currently, we can do:

  read -ep 'Enter your name: ' NAME

and I might type "Richad Neill".  #Note the deliberate typo.

If the script recognises this as invalid, the best it can do is:

  echo "Name not recognised"
  read -ep 'Re-enter your name: ' NAME

and the user must retype it in full.
I'd like to propose adding an "-i" option for initial value:

  echo "Name not recognised"
  read -ep "Please correct your name: " -i "$NAME"

The shell prints:
  Please correct your name: Richad Neill
where the latter part is now editable.

Thus the really nice editing features of readline can be used for 
updating values already stored in variables. This is extremely useful 
when the value is quite long.


---

I hope these thoughts are of some use. I'm not on this list, so please 
CC me with any followup. If this report could have been more useful, I'd 
welcome any feedback.


Best wishes,

Richard




















___
Bug-bash mailing list
Bug-bash@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-bash


Re: Bash - various feature requests

2006-12-29 Thread Richard Neill

Dear Grzegorz,

Thanks for your helpful reply.

Grzegorz Adam Hankiewicz wrote:

On 2006-12-27, Richard Neill <[EMAIL PROTECTED]> wrote:

1)substr support for a negative length argument.
For example,
  stringZ=abcdef
  echo ${stringZ:2:-1}  #prints cde

i.e. ${string:x:y}
  returns the string, from start position x for y characters.
  but, if x is negative, start from the right hand side
  and if y is negative, print up to the end, -y.

This would work the same way as PHP, and be extremely useful for, say,
removing an extension from a filename.


If extension removal is all you need, you can already do it.

$ for f in *; do echo $f; done
Makefile.am
Makefile.in
ucl
$ for f in *; do echo ${f%.*}; done
Makefile
Makefile
ucl



I did know about this, but it seems slightly like a sledgehammer on a 
nut, using a regexp instead of a substring. Also, this way doesn't allow 
you to trim both the start and the end. Lastly, I just think it would be 
a really useful (and easy-to-implement, I think) feature, which is 
consistent with the use of PHP and perl. At the moment, the first 
parameter may be any integer (postive or negative), but the second 
parameter can only be positive.


Best wishes,

Richard


___
Bug-bash mailing list
Bug-bash@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-bash


Re: Bash - various feature requests

2006-12-29 Thread Richard Neill


5)An enhancement to read/readline, such that one can specify the initial 
value with which the buffer is filled.


Currently, we can do:

  read -ep 'Enter your name: ' NAME

and I might type "Richad Neill".  #Note the deliberate typo.

If the script recognises this as invalid, the best it can do is:

  echo "Name not recognised"
  read -ep 'Re-enter your name: ' NAME

and the user must retype it in full.
I'd like to propose adding an "-i" option for initial value:

  echo "Name not recognised"
  read -ep "Please correct your name: " -i "$NAME"

The shell prints:
  Please correct your name: Richad Neill
where the latter part is now editable.

Thus the really nice editing features of readline can be used for 
updating values already stored in variables. This is extremely useful 
when the value is quite long.


Is this the same as
read -ep "Please correct your name: $NAME" NAME
?



No - that isn't what I meant. What I want to do is:
   1)Print the text:
"Please correct your name: "
without a trailing newline.

   2)Create an editable buffer (using readline), which is initialised
 with the current value of the variable $NAME and printed to screen.

   3)The user edits the buffer as desired, and when he presses ENTER, 
its contents is assigned to the variable NAME.



In the example I gave, with my proposed -i option, the script would 
present the user with this:


Please correct your name: Richad Neill
  *

the cursor position starts at the *, but can be moved around anywhere 
within the . So the typo could be corrected by keying:


Alt-B  Ctrl-B, Ctrl-B, r, ENTER


whereas currently, the user must type in the full string from scratch.



Is that clear? Sorry it's hard to explain without a diagram.

Richard





___
Bug-bash mailing list
Bug-bash@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-bash


Bash arithmetic doesn't give error message on wrap.

2007-04-27 Thread Richard Neill

Bash Version: 3.2
Patch Level: 13
Release Status: release

Description:
 $   echo $((40*40)
 -2446744073709551616

Repeat-By:
Do some arithmetic in bash $(()).
If the numbers are out of range, the output will be wrong in
all sorts of interesting ways. No error message is given.

Fix:
Arbitrary-precision maths would be nice. But at least, could we
have an error message if an overflow occurs?

The man page says:
"Evaluation is done in fixed-width integers with no
check for overflow..."
but I'd suggest this represents a bug, not a feature.


Regards,

Richard





___
Bug-bash mailing list
Bug-bash@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-bash


Re: Bash arithmetic doesn't give error message on wrap.

2007-04-28 Thread Richard Neill

Chet Ramey wrote:


Description:
 $   echo $((40*40)
 -2446744073709551616

Repeat-By:
Do some arithmetic in bash $(()).
If the numbers are out of range, the output will be wrong in
all sorts of interesting ways. No error message is given.

Fix:
Arbitrary-precision maths would be nice. But at least, could we
have an error message if an overflow occurs?

The man page says:
"Evaluation is done in fixed-width integers with no
check for overflow..."
but I'd suggest this represents a bug, not a feature.


I'm comfortable with the current behavior.  POSIX requires that expressions
be evaluated according to the C standard, and that standard leaves the
treatment of integer overflow as undefined.



If POSIX says the behaviour is undefined, then surely bash can do 
whatever it wants. So, printing an error message would be allowed.


The error message would be:

 a)Most helpful to the user (least surprise)

 b)Consistent with other cases, where bash does give warnings. For example:

$ X=$((3+078))
bash: 3+078: value too great for base (error token is "078")
$ echo $?
1


Regards,

Richard











___
Bug-bash mailing list
Bug-bash@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-bash


Re: Bash arithmetic doesn't give error message on wrap.

2007-04-29 Thread Richard Neill



Bob Proulx wrote:

Richard Neill wrote:

 b)Consistent with other cases, where bash does give warnings. For example:

$ X=$((3+078))
bash: 3+078: value too great for base (error token is "078")
$ echo $?
1


That is not really a comparable case.  The problem there is that the
leading zero specifies an octal constant and the 8 cannot be converted
to octal.  The "3+" part is just a distraction.

  echo $((08))
  bash: 08: value too great for base (error token is "08")

Bob



Are you sure this isn't comparable? After all, in both cases, the user 
has submitted something to which bash cannot give a sensible answer. In 
the integer-overflow case, bash simply returns the wrong answer, with no 
warning. But in the octal case, bash (quite correctly, and helpfully) 
prints a warning.


If bash were to be consistent, then it should display no error message 
in the case of $((3+078)); it should either "Do what I mean"  [evaluate 
3 + 078 as 0103, treating 78 as 7 *8 + 8], or "Do something daft" [eg 
evaluate as 073  (treating 8 as an integer-overflow in the units place, 
which is not checked for)]


There's obviously an advantage to the user in being warned when an 
integer overflow occurs; is there any possible downside (apart from a 
couple of extra lines of code?


Richard


___
Bug-bash mailing list
Bug-bash@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-bash


Re: Bash arithmetic doesn't give error message on wrap.

2007-04-30 Thread Richard Neill



Bob Proulx wrote:

Andreas Schwab wrote:

Richard Neill <[EMAIL PROTECTED]> writes:

Are you sure this isn't comparable? After all, in both cases, the user has
submitted something to which bash cannot give a sensible answer. In the
integer-overflow case, bash simply returns the wrong answer, with no
warning.

The answer is not really wrong, it's the same you get from the equivalent
expression when evaluated in C.


Let me phrase this in a different way. 


 8<

[OK - agreed. Thanks for your explanation]



Avoiding overflow is actually a very difficult problem.  People have
been working with and around overflow issues for years and years.
There is no clear "right" answer.  On some cpus the result is done one
way and on others the result is done a different way.  It is in these
cases where typically POSIX would give up and declare it undefined
behavior.  This is why "40*40" appears as a completely
different problem than "08".



I thought testing for overflow was quite simple?
Isn't it just a case of looking at the carry-bit, and seeing whether it 
gets set? If so, then the warning message would be a two-line patch to 
the code.


That said, I don't know enough about CPU internals to know what the 
carry-bits do with multiplication.  (Addition/Subtraction overflows just 
change the carry-flag; Integer Division never suffers from overflows).




About the only way for bash to avoid it would be to include a full
arbitrary precision math library to evaluate these expressions itself.


I wasn't suggesting that! We have bc anyway. I was only suggesting that, 
when the CPU detects an overflow, bash could pass on the warning.



But that would slow the shell down by a large amount and it would make
the shell much bigger than it is today.  Both of those things would
cause people problems.  The entire reason cpus have math processors is
because these operations can be quite slow when done in software.

To give some additional weight to this, note that perl also uses the
underlying cpu for numerical computations.  So this is the same issue
as would be true in Perl.  Or in C/C++ too.

  perl -e 'printf("%d\n",40*40);'
  -2446744073709551616


Yes... but that's actually the %d doing it. Perl would automatically 
convert to a float. Eg


$ perl -e 'print(40*40);'
1.6e+19



Thanks very much for your explanation.

Best wishes,

Richard








___
Bug-bash mailing list
Bug-bash@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-bash


Bash error message for unterminated heredoc is unhelpful.

2008-06-28 Thread Richard Neill
Dear All,

In some cases, bash gives exceptionally unhelpful error messages, of the
sort "Unexpected end of file". This is next-to-useless as a debugging
aid, since there is no way to find out where the error really lies.

I'm using bash version: bash-3.2-7mdv2008.1

Here are 2 shell scripts with examples.
   bug-example.sh   demonstrates the problem.
   bug-example2.sh  is where bash gets it right.

Thanks very much,

Richard



[EMAIL PROTECTED] ~]$ cat bug-example.sh
-
#!/bin/bash

#This is an example of bash being unhelpful with syntax errors.

function usage () {
cat <<-EOT
This help text is a heredoc
EOT
#NOTE that the line above contains a trailing TAB, and
# is therefore NOT valid as the end of the heredoc.
}

usage

echo "A long script goes here..."

echo "Many lines later..."

exit 0


#This script generates the following error:
# ./bug-example.sh: line 37: syntax error: unexpected end of file
#That is really not helpful, since there's no way to track down where
#the error started. Line 37 is not interesting. What we need is a
#warning about line 6.

#At a minimum, we should get this error message:
# ./bug-example.sh: line 37: syntax error: unexpected end of file
#  (started at line 6)

#Better, would be this error message:
# ./bug-example.sh: line 6: syntax error: unterminated heredoc.

#An additional buglet is that in fact, trailing whitespace after
#a heredoc terminator should probably be ignored anyway?
-




[EMAIL PROTECTED] ~]$ cat bug-example2.sh
-
#!/bin/bash

#This is an example of bash being *helpful* with syntax errors.

X=$((1 + 2) #NOTE missing trailing ).

echo "X is $X"  #Should print 'X is 3'


echo "A long script goes here..."
echo "Many lines later..."
exit 0

#This script gets it right with the error message.
# ./bug-example2.sh: line 5: unexpected EOF while looking for
#  matching `)'
#  ./bug-example2.sh: line 20: syntax error: unexpected end of file

#So, we can quickly find the bug.
-







Re: Bash error message for unterminated heredoc is unhelpful.

2008-06-28 Thread Richard Neill
Chet Ramey wrote:
> Richard Neill wrote:
>> Dear All,
>>
>> In some cases, bash gives exceptionally unhelpful error messages, of the
>> sort "Unexpected end of file". This is next-to-useless as a debugging
>> aid, since there is no way to find out where the error really lies.
> 
> For better or worse, bash allows end-of-file to delimit a here document.
> That is historical sh behavior.
> 
> The end-of-file syntax error message comes when the shell tries to read
> the token following the here document.
> 
> Chet
> 


Thanks for your reply.

Fair point. But nevertheless, couldn't bash still tell us why the EOF is
unexpected, and where the _start_ of the offending syntax lies?

That was the point of my second example: the error is still an
unexpected EOF, but in that case, bash also informs the user that the
missing ')' is due to a '(' which was on line 5.

Also, bash should tell us *why* the end-of-file is unexpected...

Incidentally, bash is similarly unhelpful in the case of an 'if'
statement  which is missing its 'fi', or a case missing its esac.

If the script is a long one, there's no easy way to locate the offending
token, short of laboriously commenting out sections and doing the
"divide-and-conquer" method.


Regards,

Richard




Bash/readline enhancement: wish to pre-set initial value of input text

2008-07-07 Thread Richard Neill
Dear All,

When using read, it would be really neat to be able to pre-fill the form
with a default (or previous) value.

For example, a script which wants you to enter your name, and thinks
that my name is Richard, but that I might want to correct it.
Alternatively, this would be useful within a loop, to allow correction
of previously-entered text, without fully typing it again.

So, I propose an extra option, -i, to read, which will set the initial
value of the text following the prompt.

For example,


#!/bin/bash
read -e -p 'Enter your name: ' -i 'Richard' NAME
echo "Hello, $NAME"


This would print:
   Enter your name: Richard
I would then be able to edit the part after the prompt, and change it to:
   Enter your name: R. Neill
This would then print:
   Hello, R. Neill



It is equivalent to the following in PHP/HTML:

Enter your name: 


An alternative syntax might be to make use of stdin for the read
command, eg:
  echo 'Richard' | read -e -p 'Enter your name: ' NAME
though I think I prefer the -i.


I hope you like this idea. Thanks very much for your help.

Richard





Bash substrings: wish for support for negative length (read till n from end)

2008-07-07 Thread Richard Neill
Dear All,

Substrings in bash contain 2 parameters, the start and the length.
Start may be negative, but if length is negative, it throws an error.
My request is that bash should understand negative length. This would be
useful in some occasions, and would be similar to the way PHP does it:
http://uk.php.net/manual/en/function.substr.php

For clarity, here are all the cases; the relevant ones are the last two:


$ stringZ=abcdef


$ echo ${stringZ:2} #Positive start, no length
cdef#Reads from start.


$ echo ${stringZ: -2}   #Negative start, no length
ef  #Reads 2 back from end.


$ echo ${stringZ:-2}#No space before the -
abcdef  #Is this what we expect?
#(or an unrelated bug?)


$ echo ${stringZ:2:1}   #Starts at 2, reads 1 char.
c


$ echo ${stringZ:2: -1} #Wish: start at 2, read till
ERROR   #1 before the end. i.e.
# cde

$ echo ${stringZ: -3: -1}   #Wish: start 3 back, read till
ERROR   #1 before the end. i.e.
# de



i.e. ${string:x:y}
   * returns the string, from start position x for y characters.
   *  but, if x is negative, start from the right hand side
   *  if y is negative, print up to (the end - y)



Thanks very much,

Richard




Re: Bash substrings: wish for support for negative length (read till n from end)

2008-07-07 Thread Richard Neill
Jan Schampera wrote:
> Richard Neill wrote:
> 
>> $ echo ${stringZ:2: -1}  #Wish: start at 2, read till
>> ERROR#1 before the end. i.e.
>>  # cde
>>
>> $ echo ${stringZ: -3: -1}#Wish: start 3 back, read till
>> ERROR#1 before the end. i.e.
> 
> Use (-1), i.e.
> 
> $ echo ${stringZ:2:(-1)}
> 
> See also
> http://bash-hackers.org/wiki/doku.php/syntax/pe#substring_expansion (at
> bottom of the section).
> 

Dear Jan,

Thanks for your comment. I now understand why
echo ${stringZ:-1}   is invalid, and it has to be
echo ${stringZ:(-1)}   or   echo ${stringZ: -1}

BUT, the example that you gave doesn't actually work; it returns with
an error:
 -bash: (-1): substring expression < 0

My point is that, though the first parameter (offset) may be negative,
the second parameter (length) can only be positive. This is what would
be useful to change.

Thanks,

Richard






Re: Bash/readline enhancement: wish to pre-set initial value of input text

2008-07-07 Thread Richard Neill
Jan Schampera wrote:
> Richard Neill wrote:
>> Dear All,
>>
>> When using read, it would be really neat to be able to pre-fill the form
>> with a default (or previous) value.
>>
>> For example, a script which wants you to enter your name, and thinks
>> that my name is Richard, but that I might want to correct it.
>> Alternatively, this would be useful within a loop, to allow correction
>> of previously-entered text, without fully typing it again.
> 
> A bit of the functionality (in some way) is already there. You can
> preload the commandline history and use read -e:
> 
> --snipsnap--
> If -e is supplied and the shell is interactive, readline is used to
> obtain the line.
> --snipsnap--
> 
> A bit of hard work, though.
> 

I do have a sort of workaround, namely to put the default value on the
clipboard, by using:


function copy_to_clipboard () { 
#If we're running KDE, put the text into klipper.
#Else, use xclip.  Fail silently if we can't do it.
{ dcop klipper klipper setClipboardContents "$1" ||  echo "$1" |
 xclip ; } > /dev/null 2>&1
}

copy_to_clipboard INITIAL_VALUE
read -e -p "Enter name" NAME
#User must middle-click

but this is rather ugly, as well as only semi-functional. What I want to
do is just to pre-fill readline's buffer, before it gets presented to
the user.

I also tried to hack something nasty out of xmacroplay, but that doesn't
work.


Richard






Bash: proposal for >>> operator

2008-07-22 Thread Richard Neill

Dear All,

Might I propose bash should add another operator,  >>>  for "redirection 
into a variable". This would be by analogy with the <<< operator.




For example, we can currently use <<< to save an "echo", by doing this:

   TEXT="Hello World"
   grep -o 'hello' <<<"$TEXT"

instead of

   TEXT="Hello World"
   echo "$TEXT" | grep -o 'hello'




I am therefore proposing that the following syntax would be useful:

   echo "Hello World" >>> TEXT

creates a variable named TEXT, whose contents is the string "Hello".





Why is this useful?

1. Read has a nasty habit of creating subshells. For example,

 echo Hello | read TEXT

doesn't do what we expect. TEXT is empty!


2. The $() or ``  constructs are great, excepting that they also create 
subshells. This messes up things like PIPESTATUS.


For example:

 echo hello | cat | cat | cat
 #hello
 echo [EMAIL PROTECTED]
 #0 0 0 0

 TEXT=$(echo hello | cat | cat | cat )
 echo [EMAIL PROTECTED]
 #0

Here we've captured the output we wanted, but lost the pipestatus.

3.  The $() construct doesn't let you capture both stderr and stdout 
into different variables.



I know I could do it all with tempfiles, but that somewhat misses the point.




Incidentally, if this is useful, it would be nice to support the
rather prettier counterpart to the <<< operator, and permit this usage:

 "$TEXT" >>> grep -o 'hello'


What do you think?


Regards,

Richard






bash: request for a way to return variables to the parent of a subshell

2008-07-22 Thread Richard Neill
At the moment, variables set within a subshell can never be accessed by 
the parent script. This is true, even for an implicit subshell such as 
caused by read.


For example, consider the following (slightly contrived example)


touch example-file
ls -l | while read LINE ; do
if [[ "$LINE" =~ example-file ]]; then
MATCH=true; [a]
echo "Match-1"
fi ;
done
if [ "$MATCH" == true ] ;then [b]
echo "Match-2"
fi
---


This prints "Match-1", but does not print "Match-2".

The only way to get data out of the read-subshell is by something like 
"exit 2", and looking at $?





It's already possible to export a variable into the environment, and for 
subshells to inherit variables from the main script. Do we need a new 
keyword to achieve the reverse? Is there any way to make sure that 
variables defined at [a] can be made to still exist at [b] ?



Thanks,

Richard






Re: bash: request for a way to return variables to the parent of a subshell

2008-07-22 Thread Richard Neill

Dear Eric,

Thank you for your helpful answer. I'd understood that bash *doesn't* 
pass info back from the child to the parent, but I didn't realise that 
it was fundamentally *impossible* to do in Unix. I guess that tempfiles 
would do it - though that's rather ugly.


Is there any way to use "read" to iterate over its standard input 
without creating a subshell?  If it's of interest, the actual part of 
the script I use is below - the aim is to parse the output of "ffmpeg 
-formats" to see whether certain codecs are supported by that build.


Regards,

Richard





--


#Does this version of FFMPEG support the relevant file format?  Exit
#if not. Arguments:  $1='file' or 'codec';  $2='E','encode' or
# 'D','decode', $3=format/codec_name
#Example:check_ffmpeg_format file D ogg

#The output of `ffmpeg -formats`  has section headings such as
#"File formats", and each section is delimited by a blank line.

#The first part of the line contains letters DEA(etc) depending
#on whether the codec/file is supported
#for reading (decoding) and/or writing (encoding).

function check_ffmpeg_format_support(){ .
local filecodec=$1
local decodeencode=$2
local filecodec_name=$3

if [ $filecodec == 'file' ];then
local start_trigger='File formats:'
local end_trigger=''
local terminator='\ +'
local filecodec_txt='file format'
else
local start_trigger='Codecs:'
local end_trigger=''
local terminator='$'
local filecodec_txt='with codec'
fi

if [ $decodeencode == 'decode' -o $decodeencode == 'D' ];then   
local decodeencode='D[A-Z ]*'
local decodeencode_txt='decoding'
else
local decodeencode='[A-Z ]?E[A-Z ]*'
local decodeencode_txt='encoding'
fi

  local matchme='^\ *'$decodeencode'\ +'$filecodec_name'\ *'$terminator

local relevant=false


#Warning: this pipe has the effect of a subshell. Variables are
#set inside the pipeline, and cannot be accessed outside it.
#Search between trigger points.

ffmpeg -formats 2>/dev/null | while read line ; do
if [[ $line =~ $start_trigger ]]; then
relevant=true
fi
if [[ $line == $end_trigger ]]; then
relevant=false
fi
if [ "$relevant" == true ];then
if [[ $line =~ $matchme ]];then #Regex match.
exit 2  
fi
#Exit the '| while read...'  part, and return $? so we
#know the result.
fi
done

if [ $? != 2 ]; then
echo -e "ERROR: the installed version of 'ffmpeg' was built
without support enabled for $decodeencode_txt $filecodec_txt
'$filecodec_name'.\n"
 exit 1
fi
}

--




Eric Blake wrote:

According to Richard Neill on 7/22/2008 8:04 PM:
| This prints "Match-1", but does not print "Match-2".
|
| The only way to get data out of the read-subshell is by something like
| "exit 2", and looking at $?

You can also use files.  The position within a seekable file is preserved
from child to parent, but once you are using files, you might as well go
with the file contents.

| It's already possible to export a variable into the environment, and for
| subshells to inherit variables from the main script. Do we need a new
| keyword to achieve the reverse? Is there any way to make sure that
| variables defined at [a] can be made to still exist at [b] ?

A new keyword is insufficient.  This is something that is fundamentally
not provided by Unix systems - short of using the file system, there
really isn't a way to pass arbitrary information from a child process back
to the parent.

But maybe the 'source' builtin is what you are looking for, to execute a
script in the same context rather than forking off a child.






Re: bash: request for a way to return variables to the parent of a subshell

2008-07-22 Thread Richard Neill
Thank you. That's a really neat solution - and it would never have 
occurred to me. I always think from left to right!


Richard



Paul Jarc wrote:

Richard Neill <[EMAIL PROTECTED]> wrote:

the aim is to parse the output of "ffmpeg -formats" to see whether
certain codecs are supported by that build.


I'd use something like:
while read line; do
  ...
done < <(ffmpeg -formats 2>/dev/null)

That puts ffmpeg into a subshell instead of read.


paul






Bash RFE: Goto (especially for jumping while debugging)

2008-09-22 Thread Richard Neill

Dear All,

Here's a rather controversial request, namely that bash should support 
'goto'.


The reason I'd like to see it is to make debugging long scripts easier. 
I'm working at the moment on a 2000+ line script, and if I want to test 
stuff at the end, I'd really like to have something like the following:



---
#!/bin/bash

#initialisation stuff goes here.

goto LABEL

#lots of stuff here that I want to skip.
#Bash doesn't have a multi-line comment feature.
#Even if it did, one can't do a multi-line comment containing
#another multi-line comment. A good editor makes this less
#painful, but isn't an ideal solution.

LABEL

#stuff I want to test

exit

#stuff at the end of the script, which I don't want to run while testing.
--


We already have exit, which is really short for "jump directly to the 
end". What would be great is a way to "jump into the middle".


What do you think?


Richard



P.S. I am sure lots of people will complain (correctly) that "Goto is 
considered harmful". I'd agree that, in most cases, it is. But in some 
cases, such as the above, it's quite useful. Maybe the feature could be 
called "debug-goto" in order to emphasise its debugging nature, as 
opposed to use in a regular program.



P.P.S. a hack that would demonstrate what I mean could be implemented by 
abusing heredocs.  Instead of

goto LABEL   ..  LABEL
write
cat >/dev/null <<'label_foo' ... LABEL
This has exactly the same effect.




























Re: Bash RFE: Goto (especially for jumping while debugging)

2008-09-22 Thread Richard Neill



Bob Proulx wrote:

Richard Neill wrote:

Dear All,


In the future please start a new message for a new thread of
discussion.  When you reply to old messages from three months ago
those of us who actually keep months worth of email see the message
threaded with the previous discussion about variables and subshells.
If anyone has killed that thread then the continuation is also killed.


D'oh! Sorry about that. I had always thought that editing the subject 
line was what changed the thread, rather than some internal state 
variable in the mail-client. I stand corrected.  BTW, doesn't bash have 
a bugzilla?




Here's a rather controversial request, namely that bash should support 
'goto'.


Although goto does have uses it is definitely a keyword of the
damned.


cat >/dev/null <<'label_foo' ... LABEL
This has exactly the same effect.


I actually was going to reply and suggest exactly the above.  But then
realized you had already suggested it.



That does work. Actually, maybe the way is to define a function called 
"goto", which handles the common case of a forward jump.



Normally I would do a commented out section.  Easy and non-controversial.
Or I would do if false; then...fi


No good if one wants to jump over/out-of a control structure.


For just a comment section this doesn't seem to be a compelling need.
For that you would need to show how writing state-machines or
something is so much easier or some such...


I think existing functionality would work. Given that 'goto' would 
intentionally over-ride all other control structures, then I can suggest 
2 ways this might be implemented:


i. Treat goto as "exit", but don't clean up variables and functions. 
Then, go back to the start, skipping all instructions till LABEL.
Doing it this way means that goto quits out of all nested control 
structures.


ii. Source $0, but skip all instructions till LABEL.
Doing it this way means that goto does not quit out of any nested 
control structures.



Best wishes,

Richard




idea: statically-linked "busy-bash"

2009-04-08 Thread Richard Neill

Dear All,

Here's an idea that occurred to me. I'm not sure whether it's a great 
idea, or a really really stupid one, so please feel free to shoot it 
down. Anyway, there are an awful lot of shell scripts where a huge 
number of the coreutils get repeatedly called in separate processes. 
This call-overhead makes the scripts run noticeably slower.


What I'm suggesting is to experimentally build a version of bash which 
has   mv/cp/ls/stat/grep/  all built in. This would be a rather 
bigger binary, (similar to busybox), but might allow much much faster 
execution of long scripts.


A very quick experiment shows that this might be worthwhile:

date;
for ((i=0;i<100;i++)); do echo -n ""; done;
date;
for ((i=0;i<1;i++)); do /bin/echo -n ""; done;
date

Prints:
Thu Apr  9 07:05:19 BST 2009
Thu Apr  9 07:05:30 BST 2009
Thu Apr  9 07:05:47 BST 2009


In other words, 1E6 invocations of the builtin takes about 11
seconds, while 1E4 invocations of the standalone binary
takes 17 seconds. The builtin echo is therefore about
150 times faster.

What do you think?

Richard









Re: idea: statically-linked "busy-bash"

2009-04-09 Thread Richard Neill

Andreas Schwab wrote:


What I'm suggesting is to experimentally build a version of bash which has
mv/cp/ls/stat/grep/  all built in.


This is possible without rebuilding bash, see the documentation of the
`enable' builtin.  There are already a few examples in the bash
distribution under examples/loadables.



Thanks - that's interesting to know.

However, most of the examples seem to be cut-down functions in some way 
- I can see that becoming incompatible/non-portable.


What I was contemplating is to build in most of the actual gnu utils 
with the aim of making eg the builtin "grep" functionally identical to 
/bin/grep.


Then I could make a system-wide change, replacing the regular bash shell 
with the fat-bash, and still have everything work...



Richard




Re: $\n doesn't get expanded between double-quotes

2009-07-03 Thread Richard Neill

Thanks for your reply.



Description:
Bash allows escape characters to be embedded by using the $'\n'  
syntax. However, unlike all other $variables,
this doesn't work with embedded newlines. I think it should.

Repeat-By:
X="a$'\n'b c"
echo "$X"

expect to see:
a
b c

actually see:
a$'\n'b c


Fix:
	$'\n'  should be expanded within double-quotes, like other variables are. 
	Otherwise, please correct the man-page to make it clearer.


   $'\n' is not a variable. As the man page says:

Words of the form $'string' are treated specially.

   Note "Words". Inside double quotes, $'\n' is not a word.


I agree that this is technically correct. However, it violates the 
principle of least-surprise, which is that, in bash, the $ symbol always 
expands the value of the thing after it, (with the exceptions of

'$' and \$ .)

(On re-reading the man-page, I agree that the documentation is 
consistent with your explanation; though it still appears more likely to 
imply mine)






If this is a feature, not a bug, then is there a better way to include 
newlines in a variable-assignment?
The syntaxX="a"$'\n'"b c"will do it, but that is really really 
ugly.


X=$'a\nb c'



This is still a missing feature: how to embed newlines in double-quoted 
bash string assignment:


For example, if I want to write:

EMAIL_BODY="Dear $NAME,$'\n\n'Here are the log-files for 
$(date)$'\n\n'Regards,$'\n\n'$SENDER"


then this doesn't work. There are ways around it, such as:
  - building up the string in pieces or
  - EMAIL_BODY=$(echo -e "$EMAIL_BODY")

but it's really ugly to do.


As I imagine that nobody uses the current $'\n' inside double-quotes, 
may I request this as a functionality change?



Best wishes,

Richard