RE: problem with splitting on "words"

Bob Showalter Fri, 30 Jul 2004 10:30:30 -0700

Charlotte Hee wrote:
> Hello All,
> 
> I am having trouble splitting words from titles from a list of
> research papers. I thought I could split the title into words like so:
> 
>   #!/usr/local/bin/perl
>   use locale;
> 
>   %forums = ( 1 => 'B0->K+K-Ks',
>               2 => 'B+->K+KsKs Decays',
>               3 => 'Measurement of the Total Width',
>               4 => 'Asymmetries in B0->K0s pi0 Decays'
>   );
> 
>   foreach $forum ( sort keys %forums ){
>      my $title = $forums{$forum};
>      foreach $w (split /[^\w-]+/, $title) {
>         next unless ($w =~ /^[A-Za-z]/);
>         $title =~ /\b\Q$w\E\b/;
>         print "Journal $forum indexed word = " .  ucfirst($w) . "\n";
>       }
>   }
> 
> exit;
> 
> But the results show that I'm losing some characters:
> 
> Journal 1 indexed word = B0-    # this should be B0->


No, because > matches the character class [^\w-]

> Journal 1 indexed word = K      # what happened to the '+'?

Same as above.

> Journal 1 indexed word = K-Ks
> 
> Journal 2 indexed word = B      # '+->' missing

The '-' is there, but you're only printing tokens that start with a letter.

> Journal 2 indexed word = K      # '+' missing
> Journal 2 indexed word = KsKs
> Journal 2 indexed word = Decays
> 
> Journal 3 indexed word = Measurement
> Journal 3 indexed word = Of
> Journal 3 indexed word = The
> Journal 3 indexed word = Total
> Journal 3 indexed word = Width
> 
> Journal 4 indexed word = Asymmetries
> Journal 4 indexed word = In
> Journal 4 indexed word = B0-   # should be 'B0->'
> Journal 4 indexed word = K0s
> Journal 4 indexed word = Pi0
> Journal 4 indexed word = Decays
> 
> These are only example titles but the other titles have similar
> characters in them as part of a "word". I tried adding the '-' and
> '>' to my character class but that did not work. What am I doing
> wrong here? 

It's not clear what you're defining as a "word". I'm wondering why you
aren't just splitting on whitespace?

   foreach $w (split ' ', $title) {

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>

RE: problem with splitting on "words"

Reply via email to