Re: Grep search to find exceptions

Ronald J Kimball Tue, 28 Nov 2006 07:48:22 -0800

On Wed, Nov 29, 2006 at 02:21:41AM +1100, Randy Silvers wrote:
> 
> >I'm confused about a couple things.
> >
> >First, the grep not only will match "N15" which is invalid, and not  
> >match "N15 4RC" which is valid, but it will not match "N15 4RCCCC"  
> >which is invalid.


Oops, I forgot to account for prefixes.  That's easy to fix, by adding a $
inside the (?!).

^(?![A-Z]{1,2}[\d]+ \d[A-Z][A-Z]$).+$

Now it correctly matches "N15 4RCCCC".  Thanks for catching that!


> >Second, what does the ".+" do?  Removing that causes only a blank  
> >line to be matched.

.+ is the part that actually matches the text.


> >The grep pattern is seemingly stating: "if the first characters do  
> >not match the following pattern, and, there are some characters,"  
> >then the result of the grep pattern is True so highlight that line  
> >or conduct the replace.

Yes, that is basically right.  The key is that "if the first characters do
not match the following pattern" looks at those characters, but doesn't
actually consume them.


The regex engine starts at the beginning of the regex and the beginning of
the line.

First, the engine matches ^ against the start of the line.  That's a
zero-width assertion, so the engine is still positioned at the beginning of
the line.

Next, it matches (?![A-Z]{1,2}[\d]+ \d[A-Z][A-Z]$).  That's a negative
lookahead, so the engine checks that the sub-regex in the (?!) _doesn't_
match at this point in the string.  Like ^, this is also a zero-width
assertion, so the engine is _still_ positioned at the beginning of the
line.

Next, it matches .+, moving the position to the end of the line.

Finally, it matches $ against the end of the line, another zero-width
assertion.


> >The grep pattern (with the negation) matches neither:
> >
> >A valid post code followed a carriage return, does match the  
> >pattern and is not followed by any non-space or non-tab characters.
> >A valid post code followed by other characters, does match the  
> >pattern and is followed by non-space characters.
> >
> >Just trying to understand the grep patterns.

This regex is supposed to match all invalid post codes.  It was an
oversight that it didn't match ones that are valid post codes plus extra
characters.  :)


HTH,
Ronald

-- 
------------------------------------------------------------------
Have a feature request? Not sure the software's working correctly?
If so, please send mail to <[EMAIL PROTECTED]>, not to the list.
List FAQ: <http://www.barebones.com/support/lists/bbedit_talk.shtml>
List archives: <http://www.listsearch.com/BBEditTalk.lasso>
To unsubscribe, send mail to:  <[EMAIL PROTECTED]>

Re: Grep search to find exceptions

Reply via email to