On 2022-12-20 at 02:51, Thomas Schmitt wrote: > Hi, > > i wrote:
>>> To obtain the offset of the first occurence of "CD001", do >>> >>> offst=$( expr \ >>> $( grep -a -o -b -m 1 CD001 cdimage.iso \ >>> | sed -e 's/:/ /' \ >>> | awk '{ print $1 }' ) - 32769 ) > > The Wanderer wrote: >> Cutting down the command line led me to discover that even with '-m 1', >> four different numbers are printed by the grep-pipeline subshell. >> (Without '-m 1', seven are printed.) > > This contradicts the promises of man grep about option -m. It does seem to, at least at a glance - but I think I've figured out what's going on, and it's actually consistent with the option set you gave. If I pass the same ISO through 'strings' before piping to grep, I find that there are four occurrences of 'CD001' in the first 25 lines that strings printed, and the next doesn't happen until line 20290. My guess would be that grep is treating a "line" as ending with a newline, and that there isn't a newline character in the ISO in question until after all four of those occurrences. With the '-o' option, grep prints only the parts of the line that were matched - but the plural here is very relevant. If that guess is correct, then the "line" in question has *four* occurrences, so grep prints them all - each on a separate line of output. The key to realizing how this interaction is consistent with the documentation is that the man page for '-m' doesn't promise that it will stop processing after the first match, but rather the first matched *line*. And since a "line" in a binary input file can be very long (a fact I know from lengthy and painful experience), it's entirely possible for the first matched line to contain multiple matches - each of which will then get printed. -- The Wanderer The reasonable man adapts himself to the world; the unreasonable one persists in trying to adapt the world to himself. Therefore all progress depends on the unreasonable man. -- George Bernard Shaw
signature.asc
Description: OpenPGP digital signature