why are \d and \D not implemented but don't throw errors in regex?

2013-12-07 Thread Craig Steffen
Hi,

I'm working on some bash scripts for work where I'm using a regular
expression to grab a number from the output of another command.

I've gotten fairly adept at using regular expressions, in perl mostly,
but I just couldn't get it to work in bash.

One reason was that the regex search is supposed to be a variable
rather than an literal inside the [[ ]] expression.

However, the second reason was that \d and \D are apparently not
implemented, even though \s and \S are?  And furthermore, the match
just silently fails without indicating anything is amiss.  After
searching, [[:digit:]] does work instead of \d.

Is there any particular reason why \s is implemented as a regex
specification in bash but \d isn't?  And if there's a good reason for
not implementing it, can there be a syntax error or at least a warning
if the script is trying to do something that works in other regular
expressions but produces the exactly the wrong behavior in that
context in bash?

In case it matters, bash on my work machine is:
csteffen@jyc1 10:06 ~/prompt_fu PD_-___-PGI-_ $ bash --version
GNU bash, version 3.2.51(1)-release (x86_64-suse-linux-gnu)
Copyright (C) 2007 Free Software Foundation, Inc.
csteffen@jyc1 10:30 ~/prompt_fu PD_-___-PGI-_ $

That's a Cray XE6 system with Interlagos CPUs.  The OS is SLES-based.

My laptop, where I created a test script to root out the problem, is:
craig@vorlon3:~/work/shells$ bash --version
GNU bash, version 4.2.45(1)-release (i686-pc-linux-gnu)
Copyright (C) 2011 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 

It's a Sony Vaio running Ubuntu 13.04

Both versions of bash behaved the same with regards to \d vs. [[:digit:]]

Thanks.  Sincerely,

Craig Steffen



Re: why are \d and \D not implemented but don't throw errors in regex?

2013-12-07 Thread Peter Cordes
On Sat, Dec 07, 2013 at 11:06:22AM -0600, Craig Steffen wrote:
> Hi,
> 
> I'm working on some bash scripts for work where I'm using a regular
> expression to grab a number from the output of another command.
> 
> I've gotten fairly adept at using regular expressions, in perl mostly,
> but I just couldn't get it to work in bash.
> 
> One reason was that the regex search is supposed to be a variable
> rather than an literal inside the [[ ]] expression.
> 
> However, the second reason was that \d and \D are apparently not
> implemented, even though \s and \S are?  And furthermore, the match
> just silently fails without indicating anything is amiss.  After
> searching, [[:digit:]] does work instead of \d.

 That's the behaviour of the regex library used by most things other
than perl (which has its own regex engine).  e.g. search a man page
with less(1), \s matches whitespace, \d matches the letter d.
[[:digit:]] matches digits.

 I agree your complaint seems valid, but it's the behaviour of the
regex engine built into GNU libc (in this case).  Bash on other
platforms would use the regex engine in their system libc.  (Unless
I'm mistaken in my assumption that bash doesn't have its own regex
engine.)

 It's really unfortunate that there are so many
not-universally-supported extensions to the regex language.  And as
you discovered, especially unfortunate that implementations that don't
support them just treat them as \-quoted literals, rather than
unsupported syntax.  There are probably things that depend on using
\something even when "something" isn't a special character.  However,
POSIX says 

   The interpretation of an ordinary character preceded by a
   backslash ( '\' ) is undefined.
http://pubs.opengroup.org/onlinepubs/007904875/basedefs/xbd_chap09.html

 So anything that broke with a regex library that didn't just treat
\something as literal something would be the fault of whatever was
depending on that behaviour.  So it would probably actually be good if
the default behaviour of glibc was to report a regex compilation error
in that case, or maybe even better, print a warning like "\d: unknown
special character, treating as literal".

 Of course, POSIX doesn't specify either \s or \d, just the
[:space:] and [:digit] and other character classes that can be used
within [].

-- 
#define X(x,y) x##y
Peter Cordes ;  e-mail: X(peter@cor , des.ca)

"The gods confound the man who first found out how to distinguish the hours!
 Confound him, too, who in this place set up a sundial, to cut and hack
 my day so wretchedly into small pieces!" -- Plautus, 200 BC



Re: why are \d and \D not implemented but don't throw errors in regex?

2013-12-07 Thread Chet Ramey
On 12/7/13, 6:33 PM, Peter Cordes wrote:

>  I agree your complaint seems valid, but it's the behaviour of the
> regex engine built into GNU libc (in this case).  Bash on other
> platforms would use the regex engine in their system libc.  (Unless
> I'm mistaken in my assumption that bash doesn't have its own regex
> engine.)

This is correct.  Bash uses whatever Posix regexp engine is in libc.

Chet
-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, ITS, CWRUc...@case.eduhttp://cnswww.cns.cwru.edu/~chet/