On Feb 11, 2012, at 7:20 AM, Jim foo.bar wrote: > HI everyone, > > I was just wondering whether anyone has used the clojure-opennlp > wrapper for multi-word named entity recognition (NER)? I am using it > to train a drug finder from my private corpus and even though i get > correct behavior when using the command line tool of apache openNLP > when trying to use the API i only get single-words entities > recognised!!! I've opened up a thread in the official mailing list > because initially i thought there was a genuine problem with openNLP > but since the command line tool does exactly what i want i'm starting > to think that it might not be openNLP's fault but either in my code or > in the clojure wrapper... > > I've followed both the official tutorials and the wrapper > documentation and thus i am doing everything as instructed... > I know the name finder expects tokenized sentences and i am indeed > passing tokenized sentences like this: > > (defn find-names-model [text] > (map #(drug-find (tokenize %)) > (get-sentences text))) > > It is very strange because i am getting back "Folic" but not "Folic > acid" regardless of using the exact same model i used with the command > line tool... > > Any help will be greatly appreciated... > Regards, > Jim
I have inquired on the OpenNLP mailing list about a way to train a tokenizer not to automatically split on spaces, if I hear back a way to do it I will add it to clojure-opennlp. - Lee -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to [email protected] Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/clojure?hl=en
