Regular expressions, help?

2012-04-18 Thread Sania
Hi,
So I am trying to get the number of casualties in a text. After 'death
toll' in the text the number I need is presented as you can see from
the variable called text. Here is my code
I'm pretty sure my regex is correct, I think it's the group part
that's the problem.
I am using nltk by python. Group grabs the string in parenthesis and
stores it in deadnum and I make deadnum into a list.

 text="accounts put the death toll at 637 and those missing at
653 , but the total number is likely to be much bigger"
  dead=re.match(r".*death toll.*(\d[,\d\.]*)", text)
  deadnum=dead.group(1)
  deaths.append(deadnum)
  print deaths

Any help would be appreciated,
Thank you,
Sania
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Regular expressions, help?

2012-04-19 Thread Sania
On Apr 19, 2:48 am, Jussi Piitulainen 
wrote:
> Sania writes:
> > So I am trying to get the number of casualties in a text. After 'death
> > toll' in the text the number I need is presented as you can see from
> > the variable called text. Here is my code
> > I'm pretty sure my regex is correct, I think it's the group part
> > that's the problem.
> > I am using nltk by python. Group grabs the string in parenthesis and
> > stores it in deadnum and I make deadnum into a list.
>
> >  text="accounts put the death toll at 637 and those missing at
> > 653 , but the total number is likely to be much bigger"
> >       dead=re.match(r".*death toll.*(\d[,\d\.]*)", text)
> >       deadnum=dead.group(1)
> >       deaths.append(deadnum)
> >       print deaths
>
> It's the regexp. The .* after "death toll" each the input as far as it
> can without making the whole match fail. The group matches only the
> last digit in the text.
>
> You could allow only non-digits before the number. Or you could look
> up the variant of * that only matches as much as it must.

Hey Thanks,
So now my regex is

dead=re.match(r".*death toll.{0,20}(\d[,\d\.]*)", text)

But I only find 7 not 657. How is it that the group is only matching
the last digit? The whole thing is parenthesis not just the last
part. ?
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Regular expressions, help?

2012-04-19 Thread Sania
On Apr 19, 9:52 am, Jon Clements  wrote:
> On Thursday, 19 April 2012 07:11:54 UTC+1, Sania  wrote:
> > Hi,
> > So I am trying to get the number of casualties in a text. After 'death
> > toll' in the text the number I need is presented as you can see from
> > the variable called text. Here is my code
> > I'm pretty sure my regex is correct, I think it's the group part
> > that's the problem.
> > I am using nltk by python. Group grabs the string in parenthesis and
> > stores it in deadnum and I make deadnum into a list.
>
> >  text="accounts put the death toll at 637 and those missing at
> > 653 , but the total number is likely to be much bigger"
> >       dead=re.match(r".*death toll.*(\d[,\d\.]*)", text)
> >       deadnum=dead.group(1)
> >       deaths.append(deadnum)
> >       print deaths
>
> > Any help would be appreciated,
> > Thank you,
> > Sania
>
> Or just don't fully rely on a regex. I would, for time, and the little sanity 
> I believe I have left, would just do something like:
>
> death_toll = re.search(r'death toll.*\d+', text).group().rsplit(' ', 1)[1]
>
> hth,
>
> Jon.

Thank you all so much!

I ended up using Jussi's advice.  \D{0,20}
Azrazer what you suggested works but I need to make sure that it
catches numbers like 6,370 as well as 637. And I tried tweaking the
regex around from the one you said in your reply but It didn't work
(probably would have if I was more adept). But thanks!

Jon- I kind of see what you are doing. In the regex you say that after
death toll there can be 0 or more characters followed by 1 or more
digits (although I would need to add a comma within digit so it
catches 6,370). I can also see that you are splitting each string but
I don't understand the 1 in rsplit(' ', 1)[1]. I am not really
familiar with the syntax I guess.

Thanks again!
-- 
http://mail.python.org/mailman/listinfo/python-list