[ 
https://issues.apache.org/jira/browse/OPENNLP-859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Martin Wiesner closed OPENNLP-859.
----------------------------------
    Resolution: Feedback Received

That question received feedback. The reporter did not reply since 2017.

Closing.

> Cannot get entities from trained model using DictionaryFeatureGenerator 
> ------------------------------------------------------------------------
>
>                 Key: OPENNLP-859
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-859
>             Project: OpenNLP
>          Issue Type: Question
>          Components: Name Finder
>    Affects Versions: 1.6.0
>         Environment: ubuntu 16.04 java 8
>            Reporter: Damiano Porta
>            Priority: Major
>
> Hello,
> I have created the following training data.
> {code:title=train.txt|borderStyle=solid}
> Ciao mi chiamo <START:person> Damiano <END> ed abito a Roma  .
> il mio indirizzo è via del <START:person> Corso <END> nella provincia di Roma 
> .
> il mio cap è lo 00144 nella capitale e e il mio nome è  <START:person> john 
> <END> .
> Abito a Roma in via tar dei tali 10 , <START:person> Mario <END> è il mio 
> amico .
> Oggi ho incontrato <START:person> giovanni <END> e siamo andati a giocare a 
> calcio .
> {code}
> And then this code:
> {code:title=test.java|borderStyle=solid}
>         Charset charset = Charset.forName("UTF-8");
>         ObjectStream<String> lineStream =
>                         new PlainTextByLineStream(new 
> FileInputStream("/home/damiano/person.train"), charset);
>         ObjectStream<NameSample> sampleStream = new 
> NameSampleDataStream(lineStream);
>         TokenNameFinderModel model;
>         Dictionary dictionary = new Dictionary();
>         dictionary.put(new StringList(new String[]{"giovanni"}));
>         dictionary.put(new StringList(new String[]{"maria"}));
>         dictionary.put(new StringList(new String[]{"luca"}));
>       
>         BufferedOutputStream aa = null;
>           
>         AdaptiveFeatureGenerator featureGenerator = new 
> CachedFeatureGenerator(
>                  new AdaptiveFeatureGenerator[]{                              
>    
>                     new WindowFeatureGenerator(new TokenFeatureGenerator(), 
> 2, 2),
>                     new WindowFeatureGenerator(new 
> TokenClassFeatureGenerator(true), 2, 2),
>                     new OutcomePriorFeatureGenerator(),
>                     new PreviousMapFeatureGenerator(),
>                     new BigramNameFeatureGenerator(),
>                     new SentenceFeatureGenerator(true, false),
>                     new DictionaryFeatureGenerator("person", dictionary)
>                    });
>         try {
>             model = NameFinderME.train("it", "person", sampleStream, 
> TrainingParameters.defaultParams(),
>                     featureGenerator, Collections.<String, Object>emptyMap());
>         }
>         finally {
>           sampleStream.close();
>         }
>         // Save trained model
>         try (BufferedOutputStream modelOut = new BufferedOutputStream(new 
> FileOutputStream("/home/damiano/it-person-custom.bin"))) {
>           model.serialize(modelOut);
>         }
>                 
>         // Read the trained model
>         try (InputStream modelIn = new 
> FileInputStream("/home/damiano/it-person-custom.bin")) {
>             TokenNameFinderModel nerModel = new TokenNameFinderModel(modelIn);
>             NameFinderME nameFinder = new NameFinderME(nerModel, 
> featureGenerator, NameFinderME.DEFAULT_BEAM_SIZE);
>           
>             String sentence[] = new String[]{
>                 "Ciao", "mi", "chiamo", "Damiano", "e", "sono", "di", "Roma", 
> "."
>             };
>             
>             Span nameSpans[] = nameFinder.find(sentence);                     
>           
>             System.out.println(Arrays.toString(Span.spansToStrings(nameSpans, 
> sentence)));
>         }      
> {code}
> When i try 
> {code}
> "Ciao", "mi", "chiamo", "Damiano", "e", "sono", "di", "Roma", "."
> {code}
> it correctly detect "Damiano" as PERSON, but if i change it with:
> {code}
> "Ciao", "mi", "chiamo", "maria", "e", "sono", "di", "Roma", "."
> {code}
> it does not detect "maria" as PERSON but I added "maria" in the dictionary so 
> it should get it. Why not ?
> Thanks!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to