On Sat, Dec 07, 2013 at 11:06:22AM -0600, Craig Steffen wrote: > Hi, > > I'm working on some bash scripts for work where I'm using a regular > expression to grab a number from the output of another command. > > I've gotten fairly adept at using regular expressions, in perl mostly, > but I just couldn't get it to work in bash. > > One reason was that the regex search is supposed to be a variable > rather than an literal inside the [[ ]] expression. > > However, the second reason was that \d and \D are apparently not > implemented, even though \s and \S are? And furthermore, the match > just silently fails without indicating anything is amiss. After > searching, [[:digit:]] does work instead of \d.
That's the behaviour of the regex library used by most things other than perl (which has its own regex engine). e.g. search a man page with less(1), \s matches whitespace, \d matches the letter d. [[:digit:]] matches digits. I agree your complaint seems valid, but it's the behaviour of the regex engine built into GNU libc (in this case). Bash on other platforms would use the regex engine in their system libc. (Unless I'm mistaken in my assumption that bash doesn't have its own regex engine.) It's really unfortunate that there are so many not-universally-supported extensions to the regex language. And as you discovered, especially unfortunate that implementations that don't support them just treat them as \-quoted literals, rather than unsupported syntax. There are probably things that depend on using \something even when "something" isn't a special character. However, POSIX says The interpretation of an ordinary character preceded by a backslash ( '\' ) is undefined. http://pubs.opengroup.org/onlinepubs/007904875/basedefs/xbd_chap09.html So anything that broke with a regex library that didn't just treat \something as literal something would be the fault of whatever was depending on that behaviour. So it would probably actually be good if the default behaviour of glibc was to report a regex compilation error in that case, or maybe even better, print a warning like "\d: unknown special character, treating as literal". Of course, POSIX doesn't specify either \s or \d, just the [:space:] and [:digit] and other character classes that can be used within []. -- #define X(x,y) x##y Peter Cordes ; e-mail: X(peter@cor , des.ca) "The gods confound the man who first found out how to distinguish the hours! Confound him, too, who in this place set up a sundial, to cut and hack my day so wretchedly into small pieces!" -- Plautus, 200 BC