Re: [Tutor] Picking up citations

2009-02-11 Thread Kent Johnson
On Tue, Feb 10, 2009 at 5:54 PM, Kent Johnson wrote: > Another attempt attached, it recognizes the n. separator and gets the last > item. And here is the actual attachment. Kent # Parser for legal citations, PLY version # This version doesn't parse the names from ply import lex, yacc debug =

Re: [Tutor] Picking up citations

2009-02-10 Thread Dinesh B Vadhia
Subject: Re: [Tutor] Picking up citations To: Message-ID: <0a8f5cca89bf4b08becd3c4b86f18...@awa2> Content-Type: text/plain; charset="us-ascii" Dinesh and Kent - I've been lurking along as you run this problem to ground. The syntax you are working on looks very slippery, and remin

Re: [Tutor] Picking up citations

2009-02-10 Thread Kent Johnson
On Tue, Feb 10, 2009 at 12:42 PM, Dinesh B Vadhia wrote: > Kent > > The citation without the name is perfect (and this appears to be how most > citation parsers work). There are two issues in the test run: > > 1. The parallel citation 422 U.S. 490, 499 n. 10, 95 S.Ct. 2197, 2205 n. > 10, 45 L.Ed

Re: [Tutor] Picking up citations

2009-02-10 Thread Dinesh B Vadhia
sh B Vadhia Cc: tutor@python.org Subject: Re: [Tutor] Picking up citations On Tue, Feb 10, 2009 at 12:42 PM, Dinesh B Vadhia wrote: > Kent > > The citation without the name is perfect (and this appears to be how most > citation parsers work). There are two issues in the test run: &g

Re: [Tutor] Picking up citations

2009-02-10 Thread Paul McGuire
Dinesh and Kent - I've been lurking along as you run this problem to ground. The syntax you are working on looks very slippery, and reminds me of some of the issues I had writing a generic street address parser with pyparsing (http://pyparsing.wikispaces.com/file/view/streetAddressParser.py). Ma

Re: [Tutor] Picking up citations

2009-02-10 Thread Kent Johnson
On Tue, Feb 10, 2009 at 12:42 PM, Dinesh B Vadhia wrote: > Kent > > The citation without the name is perfect (and this appears to be how most > citation parsers work). There are two issues in the test run: > > 1. The parallel citation 422 U.S. 490, 499 n. 10, 95 S.Ct. 2197, 2205 n. > 10, 45 L.Ed

Re: [Tutor] Picking up citations

2009-02-10 Thread Dinesh B Vadhia
last citation ie. 463 U.S. 29, 43, 103 S.Ct. 2856, 2867, 77 L.Ed.2d 443 (1983). I tested it on another sample text and it missed the last citation too. Thanks! Dinesh From: Kent Johnson Sent: Tuesday, February 10, 2009 4:01 AM To: Dinesh B Vadhia Cc: tutor@python.org Subject: Re: [Tutor] Pi

Re: [Tutor] Picking up citations

2009-02-10 Thread Kent Johnson
On Mon, Feb 9, 2009 at 12:51 PM, Dinesh B Vadhia wrote: > Kent /Emmanuel > > Below are the results using the PLY parser and Regex versions on the > attached 'sierra' data which I think covers the common formats. Here are > some 'fully unparsed" citations that were missed by the programs: > > Smit

Re: [Tutor] Picking up citations

2009-02-10 Thread Lie Ryan
On Mon, 09 Feb 2009 14:42:47 -0800, Marc Tompkins wrote: > Aha! My list of "magic words"! > (Sorry for the top post - anybody know how to change quoting defaults in > Android Gmail?) > --- www.fsrtechnologies.com > > On Feb 9, 2009 2:16 PM, "Dinesh B Vadhia" > wrote: > > Kent /Emmanuel > > I

Re: [Tutor] Picking up citations

2009-02-09 Thread Marc Tompkins
Aha! My list of "magic words"! (Sorry for the top post - anybody know how to change quoting defaults in Android Gmail?) --- www.fsrtechnologies.com On Feb 9, 2009 2:16 PM, "Dinesh B Vadhia" wrote: Kent /Emmanuel I found a list of words before the first word that can be removed which I think i

[Tutor] Picking up citations

2009-02-09 Thread Dinesh B Vadhia
Kent /Emmanuel I found a list of words before the first word that can be removed which I think is the only way to successfully parse the citations. Here they are: | E.g. | Accord | See |See + Also | Cf. | Compare | Contra | But + See | But + Cf. | See Generally | Citing | In | Dinesh

Re: [Tutor] Picking up citations

2009-02-09 Thread Dinesh B Vadhia
Kent /Emmanuel Below are the results using the PLY parser and Regex versions on the attached 'sierra' data which I think covers the common formats. Here are some 'fully unparsed" citations that were missed by the programs: Smith v. Wisconsin Dept. of Agriculture, 23 F.3d 1134, 1141 (7th Cir.1

Re: [Tutor] Picking up citations

2009-02-09 Thread Kent Johnson
On Sun, Feb 8, 2009 at 7:07 PM, Dinesh B Vadhia wrote: > Hi Kent > > From pyparsing to PLY in a few days ... this is too much to handle! I tried > the program and like you said it works except for the inclusion of the full > name. I tested it on different text and it doesn't work as expected (se

Re: [Tutor] Picking up citations

2009-02-08 Thread Kent Johnson
On Sun, Feb 8, 2009 at 5:53 PM, Emmanuel Ruellan wrote: > Dinesh B Vadhia wrote: >> Hi! I want to process text that contains citations, in this case in legal >> documents, and pull-out each individual citation. > > > Here is my stab at it, using regular expressions. Any comments welcome. It's a

Re: [Tutor] Picking up citations

2009-02-08 Thread Emmanuel Ruellan
Dinesh B Vadhia wrote: > Hi! I want to process text that contains citations, in this case in legal > documents, and pull-out each individual citation. Here is my stab at it, using regular expressions. Any comments welcome. I had to use two regexes, one to find all citations, and the other one

Re: [Tutor] Picking up citations

2009-02-08 Thread Kent Johnson
I guess I'm in the mood for a parsing challenge this weekend, I wrote a PLY version of the citation parser, see attached. It generates exactly the output you asked for except for the inclusion of "In" in the name. Kent # Parser for legal citations, PLY version from ply import lex, yacc text = ""

Re: [Tutor] Picking up citations

2009-02-07 Thread Dinesh B Vadhia
, 493 U.S. 146, 159-60 (1934)" I didn't know about pyparsing which appears to be very powerful and have joined their list. Thank-you for your help. Dinesh From: Kent Johnson Sent: Saturday, February 07, 2009 1:19 PM To: Dinesh B Vadhia Cc: tutor@python.org Subject: Re: [Tutor]

Re: [Tutor] Picking up citations

2009-02-07 Thread Marc Tompkins
On Sat, Feb 7, 2009 at 1:19 PM, Kent Johnson wrote: > > It is correct except for the inclusion of "In" in the name and the > extra space before the comma separating the page numbers in the last > citation. > As I've been reading along, I've been thinking that the word "In" qualifies as a "magic

Re: [Tutor] Picking up citations

2009-02-07 Thread Kent Johnson
It turns out you can use Or expressions to cause a kind of backtracking in Pyparsing. This is very close to what you want: Name1 = Forward() Name1 << Combine(Word(alphas) + Name1 | Word(alphas) + Suppress('v.'), joinString=' ', adjacent=False).setResultsName('name1') Name2 = Combine(OneOrMore(Word

Re: [Tutor] Picking up citations

2009-02-07 Thread Kent Johnson
On Sat, Feb 7, 2009 at 11:53 AM, Dinesh B Vadhia wrote: > Wow Kent, what a great start! > > I found this > http://mail.python.org/pipermail/python-list/2006-April/376149.html which > lays out some patterns of legal citations ie. Here is another good reference: http://philip.greenspun.com/politics

Re: [Tutor] Picking up citations

2009-02-07 Thread Dinesh B Vadhia
n petit jury, he would clearly have standing to challenge the systematic exclusion of any identifiable group from jury service." Okay, I'd better get to grips with pyparsing! Dinesh From: Kent Johnson Sent: Saturday, February 07, 2009 6:21 AM To: Dinesh B Vadhia Cc: tutor@pytho

Re: [Tutor] Picking up citations

2009-02-07 Thread Kent Johnson
On Sat, Feb 7, 2009 at 1:11 AM, Dinesh B Vadhia wrote: > Hi! I want to process text that contains citations, in this case in legal > documents, and pull-out each individual citation. Here is a sample text: > The results required are: > > Carter v. Jury Commission of Greene County, 396 U.S. 32

Re: [Tutor] Picking up citations

2009-02-07 Thread Kent Johnson
On Sat, Feb 7, 2009 at 1:11 AM, Dinesh B Vadhia wrote: > Hi! I want to process text that contains citations, in this case in legal > documents, and pull-out each individual citation. > Before attempting to solve this problem I thought I'd first ask if anyone > has seen a solution before? This g

Re: [Tutor] Picking up citations

2009-02-07 Thread spir
Le Fri, 6 Feb 2009 22:11:14 -0800, "Dinesh B Vadhia" a écrit : > Hi! I want to process text that contains citations, in this case in legal > documents, and pull-out each individual citation. Here is a sample text: > > text = "Page 500 Carter v. Jury Commission of Greene County, 396 U.S. 320,

[Tutor] Picking up citations

2009-02-06 Thread Dinesh B Vadhia
Hi! I want to process text that contains citations, in this case in legal documents, and pull-out each individual citation. Here is a sample text: text = "Page 500 Carter v. Jury Commission of Greene County, 396 U.S. 320, 90 S.Ct. 518, 24 L.Ed.2d 549 (1970); Lathe Turner v. Fouche, 396 U.S. 34