On Thu, Mar 20, 2003 at 03:36:28AM +0100, Axel Schlicht wrote: > But, as grep and sed only operate on line levels (altho sed can work on > multiple lines with some tweaking) the $ should play no role here > So > grep '/Name/[^/][^/]*' > should mean > find a line with /Name/ anywhere > the find at least one (first [^/]) character <> '/' > then find no more or any number of characters <> '/' > but once you find a '/' the mission is over and you have to drop that > line
No, you misunderstand. Your regular expression matches "/Name/" followed by one or more non-/ characters. It says nothing about what is allowed to follow those non-/ characters. If you want the non-/ characters to extend until the end of the line - that is, you want no / characters until the end of the line - you *must* anchor the regular expression using a final $. > No for the false matches > blaba/Name/aaa/1 : > blaba/Name/ : possible hit : state : valid > blaba/Name/a : no '/', : state still valid : let's go on > blaba/Name/aa : no '/', : state still valid : let's go on > blaba/Name/aaa : no '/', : state still valid : let's go on > blaba/Name/aaa/ : '/' read : state invalid : let's get out of here : You've definitely misunderstood how unanchored regular expressions work. In general, tools that handle regular expressions do *not* require them to match all the input, so your "state invalid" actually means "we ran off the end of the regular expression before we ran out of input, but that's OK". In other words, when your regex is applied to "blaba/Name/aaa/", it successfully matches the "blaba/Name/aaa" portion of the input, and since you have placed no constraint on it to match the entire line it feels no obligation to worry about the trailing /. In sed, you'll find that the special character & on the replacement side of an s/// command refers to "that portion of the pattern space which matched" (from the sed(1) man page), clearly implying that the entire pattern space does not necessarily have to match. > So why does grep / sed report them, also they violate the limits of the > regex, as is: > not '/' after /Name/, period. Because those tools, like almost all others, are content to match regexes against substrings of the input. Your regular expression does limit what it matches, but not in the way you think it does. > THere should only be one possible explanation > the preceeding .* may go haywire and read up to the end of the line > before thinking on matching anything else, but although greedy, sed and > the like should only be greedy up to a point, that is .*Anything will be > interpreted as read as much as you like, but once you meet an Anything > you'll stop. That is also incorrect, and in fact completely misses the point of greedy quantifiers. /.*Anything/ applied to "fooAnythingbarAnything" matches the entire string, not just "fooAnything". If you're using Perl-style regexes you can use /.*?Anything/ to modify this behaviour. Cheers, -- Colin Watson [EMAIL PROTECTED] -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]