Package: grep
Version: 2.5.1.ds2-5
Severity: normal

I noticed that enabling --ignore-case suddenly caused certain patterns
not to match any longer although they should:

$ echo 'foo bar' | grep    '^foo\W'
foo bar
$ echo 'foo bar' | grep -i '^foo\W'
$

Digging further reveals that there's an locales influence since
$ echo 'foo bar' | LANG=C grep -i '^foo\W'
foo bar
$

matches again. After a check using all my generated locales:

MATCH:
- de_DE
- [EMAIL PROTECTED]
- en_US

FAIL:
- de_DE.UTF-8
- [EMAIL PROTECTED]
- en_US.UTF-8

there's a strong impression that UTF-8 locales somehow disturb \W when
using -i.

Even more confusing, using the bracket expression instead of the synonym
matches again:
$ echo 'foo bar' | LANG=de_DE.UTF-8 grep -i '^foo[^[:alnum:]]'
foo bar
$

For the records, this sounds somewhat similar to #209194 and #218873 but
these bugs are fixed in this version (2.5.1.ds2-5), I've checked.

By the way, there's a typo in the manpage

  and
  .B \eW
  is a synonym for
- .BR [^[:alnum]] .
+ .BR [^[:alnum:]] .
  .PP

-- System Information:
Debian Release: testing/unstable
  APT prefers testing
  APT policy: (500, 'testing')
Architecture: i386 (i686)
Shell:  /bin/sh linked to /bin/bash
Kernel: Linux 2.6.17.13
Locale: [EMAIL PROTECTED], [EMAIL PROTECTED] (charmap=UTF-8)

Versions of packages grep depends on:
ii  libc6                        2.3.6.ds1-4 GNU C Library: Shared libraries

grep recommends no packages.

-- no debconf information

Attachment: signature.asc
Description: Digital signature

Reply via email to