On Tue, Nov 07, 2006 at 01:00:34AM +1100, John O'Hagan wrote: > On Monday 06 November 2006 18:38, David Jardine wrote: > > On Mon, Nov 06, 2006 at 11:27:58AM +1100, John O'Hagan wrote: > > [...] > > > > E.g., if IN contains: > > > > > > junk info 18 Pro > > > > But what if that line were: > > > > junk info 18 Pro- > > > > which seems more likely? > > > > [...] > > You're right; but the OP, Michael, gave the above scenario as his problem. If > your situation were the case, though, I guess we could use tr -d '-' to get > rid of all the hyphens first as well.
the problem there is what if the desired result word includes a hyphen, then you'll have modified your result. I think you should go ahead and tr -d '\n' | tr ' ' '\n' | and then grep for a regex of Processor that allows for hyphens. you could limit it to the usual hyphen locations Pro-cess-or or is it Pro-ces-sor? here's another problem. target word is at end of line with processor at beginning of next line. There is only a newline between them and so the result becomes test word target-wordProcessor other junk you're grep will return 'word' instead of 'target-word'. You'd have to use a n old find-replace trick tr '\n' ' ' | tr -s ' ' '\n' | grep -B1 'Pro-*cess-*or' | grep -v 'Pro-*cess-*or\--' this replaces newlines with spaces and then replaces all single or multiple occurences of spaces with newlines. this allows that edge case above to come through properly. Then I think the grep is right to match zero or more hyphens in processor. A
signature.asc
Description: Digital signature