Brano Gerzo schreef:
> Dr.Ruud [DR], on Thursday, July 13, 2006 at 21:05 (+0200) made these
> points:
>
>> I don't understand what you try to match with "[\w\s\+:]+". It
>> matches any series of characters that belong to the character class
>> containing [[:word:]], [[:space:]], a plus and a colon. So "a b :c"
>> would match.
>
> yes, my example was ambiguous sorry, for that. Here are more examples:
>
> word
> word word
> word word word
> 1 word
> 1 word word word
> 1 word en,pt,sk
> 1 word en 1cd
>
> so:
> - first digits are optional
> - then it is followed by word(s), which are mandatory
> - then it should be 1 language (en), or set of any number of
> languages (en,sk,pt)
> - digit(cd) is optional
>
> Thats all
>
> Thank you for your nice code!
Slight revision, that fails on the last line:
#!/usr/bin/perl
use warnings ;
use strict ;
sub sp { '[[:blank:]]+' }
sub capture { "(@_)" }
sub optional { "(?:@_)?" }
sub optimany { "(?:@_)*" }
sub REnumber { '\d+' }
sub REword { '\w+' }
sub RElang { '
(?:
a[ly]|b[gs]|cs|d[ae]|e[nst]|
f[ir]|gr|h[eruy]|it|ja|kk|lv|nl|
p[blt]|r[ou]|s[klqrv]|t[hr]|uk|zh)
' }
sub REwordlist { REword . optimany( sp . REword ) . '(?='.sp.'|$)' }
sub RElanglist { RElang . optimany( ',' . RElang ) }
my $re = optional(capture(REnumber).sp)
. capture(REwordlist)
. optional(sp.capture(RElanglist))
. optional(sp.capture(REnumber).'cd') ;
print "re/$re/\n\n\n" ;
my $qr = qr/ $re /x ;
while ( <DATA> )
{
no warnings ;
print "\n" ;
print ;
/$qr/ and print "($1) ($2) ($3) ($4)\n" ;
}
__DATA__
word
word word
word word word
1 word
1 word word word
1 word en,pt,sk
1 word en 1cd
################################
That last line "1 word en lcd" can be parsed differently if (for
example) "word" can't start with a digit, etc.
--
Affijn, Ruud
"Gewoon is een tijger."
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>