[ 
https://issues.apache.org/jira/browse/OPENNLP-1223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

J. Fiala updated OPENNLP-1223:
------------------------------
    Description: 
Add NameFinder model based on the Tiger treebank 2.2 (Universität Stuttgart - 
www.ims.uni-stuttgart.de/forschung/ressourcen/korpora/tiger.html)

 

1.) add model based on tiger (/)

>>> generated based on 6.271 sentences with tagged names (always given name + 
>>> surname).

2.) add a few test sentences (/)

3.) add small evaluation file (/)

 
h3. Input data
 * tigercorpus-2.2.conll09.tar.gz (Uni Stuttgart)
 www.ims.uni-stuttgart.de/forschung/ressourcen/korpora/tiger.html
 * yagoLabels.tsv.7z (Max Planck Institute)
 
[https://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/yago-naga/yago/downloads/]

h3. Basic workflow

1.) Extract sentences in the tiger database with possible names (two words in 
sequence tagged as NE)

2.) Check if possible names include a given name based on the YAGO labels 
database (given name is assumed as first name)

3.) If given name is included in YAGO labels as givenName, then tag the person 
name
h3. Further Improvements:

1.) There may be some names which are referring to locations which have to be 
refined (e.g. San Juan):

Fünf bis sechs Stunden , damit sie zur Besinnung kommen , meint <START:person> 
Salvador Lopez <END>Gonzalez , das Oberhaupt von <START:person> San Juan <END> 
<START:person> Juan Chamula <END> , einem pittoresken Ort hoch in den Bergen 
von .).

 2.) Add support for names with more than two words (e.g. Salvador Lopez 
Gonzalez above).

  was:
Add NameFinder model based on the Tiger treebank 2.2 (Universität Stuttgart - 
www.ims.uni-stuttgart.de/forschung/ressourcen/korpora/tiger.html)

 

1.) add model (/)

>>> generated based on 6.271 sentences with tagged names (always given name + 
>>> surname).

2.) add a few test sentences (/)

3.) add small evaluation file (/)

 
h3. Input data
 * tigercorpus-2.2.conll09.tar.gz (Uni Stuttgart)
www.ims.uni-stuttgart.de/forschung/ressourcen/korpora/tiger.html
 * yagoLabels.tsv.7z (Max Planck Institute)
 
https://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/yago-naga/yago/downloads/

h3. Basic workflow

1.) Extract sentences in the tiger database with possible names (two words in 
sequence tagged as NE)

2.) Check if possible names include a given name based on the YAGO labels 
database (given name is assumed as first name)

3.) If given name is included in YAGO labels as givenName, then tag the person 
name
h3. Further Improvements:

1.) There may be some names which are referring to locations which have to be 
refined (e.g. San Juan):

Fünf bis sechs Stunden , damit sie zur Besinnung kommen , meint <START:person> 
Salvador Lopez <END>Gonzalez , das Oberhaupt von <START:person> San Juan <END> 
<START:person> Juan Chamula <END> , einem pittoresken Ort hoch in den Bergen 
von .).

 2.) Add support for names with more than two words (e.g. Salvador Lopez 
Gonzalez above).


> Add NameFinder model based on Tiger
> -----------------------------------
>
>                 Key: OPENNLP-1223
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-1223
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: language model
>            Reporter: J. Fiala
>            Priority: Major
>         Attachments: tiger_2.2_namefinder.bin.7z, 
> tiger_2.2_namefinder.testdata.txt, tiger_2.2_namefinder_eval.txt
>
>
> Add NameFinder model based on the Tiger treebank 2.2 (Universität Stuttgart - 
> www.ims.uni-stuttgart.de/forschung/ressourcen/korpora/tiger.html)
>  
> 1.) add model based on tiger (/)
> >>> generated based on 6.271 sentences with tagged names (always given name + 
> >>> surname).
> 2.) add a few test sentences (/)
> 3.) add small evaluation file (/)
>  
> h3. Input data
>  * tigercorpus-2.2.conll09.tar.gz (Uni Stuttgart)
>  www.ims.uni-stuttgart.de/forschung/ressourcen/korpora/tiger.html
>  * yagoLabels.tsv.7z (Max Planck Institute)
>  
> [https://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/yago-naga/yago/downloads/]
> h3. Basic workflow
> 1.) Extract sentences in the tiger database with possible names (two words in 
> sequence tagged as NE)
> 2.) Check if possible names include a given name based on the YAGO labels 
> database (given name is assumed as first name)
> 3.) If given name is included in YAGO labels as givenName, then tag the 
> person name
> h3. Further Improvements:
> 1.) There may be some names which are referring to locations which have to be 
> refined (e.g. San Juan):
> Fünf bis sechs Stunden , damit sie zur Besinnung kommen , meint 
> <START:person> Salvador Lopez <END>Gonzalez , das Oberhaupt von 
> <START:person> San Juan <END> <START:person> Juan Chamula <END> , einem 
> pittoresken Ort hoch in den Bergen von .).
>  2.) Add support for names with more than two words (e.g. Salvador Lopez 
> Gonzalez above).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to