Thanks Jeff. I hope to try this out later today. I thought I had the
solution earlier this morning, but I ran into a problem. I hope this will
solve it ! Thanks again.
"Jeff 'japhy' Pinyan" <[EMAIL PROTECTED]>
01/23/2004 10:34 AM
Please respond to
[EMAIL PROTECTED]
To
[EMAIL PROTECTED]
cc
[EMAIL PROTECTED]
Subject
Re: Need help with a regex
On Jan 23, [EMAIL PROTECTED] said:
>This newbie needs help with a regex. Here's what the data from a text
>file looks like. There's no delimiter and the fields aren't evenly spaced
>apart.
>
>apples San Antonio Fruit
>oranges Sacramento Fruit
>pineapples Honolulu Fruit
>lemons Corona del Rey Fruit
>
>Basically, I want to put the city names into an array. The first field,
>the fruit name, is always one word with no spaces.
>
>Anyone know how to do that ?
Well, there are many ways. You could split the string on whitespace,
remove the first and last elements, and join the others with spaces:
for (@data) {
my @fields = split;
shift @fields;
pop @fields;
push @cities, "@fields"; # "@array" = join(" ", @array)
}
Or, you could use a regex that gets SPECIFICALLY what you want:
for (@data) {
push @cities, $1 if /^\S+\s+(\S+(?:\s+\S+)*)\s+\S+$/;
}
That regex might need a bit of explanation:
m{
^ # the beginning of the string
\S+ # one or more non-spaces
\s+ # one or more spaces
( # capture to $1:
\S+ # first word of the city name
(?: \s+ \S+ )* # *ALL* remaining words
)
\s+ # one or more spaces
\S+ # one or more non-spaces
$ # the end of the string
}x;
What this does on a string like "peach Georgia fruit" is this: the first
\S+\s+ matches "peach ". Then we capture "Georgia fruit" to $1. However,
the REST of the regex still has to match, but it can't, so the (?:\s+\S+)*
backtracks -- it gives up one of the chunks it matched, so $1 is only
"Georgia". Then the last \s+\S+ can match " fruit".
--
Jeff "japhy" Pinyan [EMAIL PROTECTED] http://www.pobox.com/~japhy/
RPI Acacia brother #734 http://www.perlmonks.org/ http://www.cpan.org/
<stu> what does y/// stand for? <tenderpuss> why, yansliterate of course.
[ I'm looking for programming work. If you like my work, let me know. ]