RE: OPENNLP problems

Patrick Mi Sun, 09 Jun 2013 16:40:02 -0700

Hi Lance,

I updated the src from 4.x and applied the latest patch LUCENE-2899-x.patch
uploaded on 6th June but still had the same problem.



Regards,
Patrick

-----Original Message-----
From: Lance Norskog [mailto:goks...@gmail.com] 
Sent: Thursday, 6 June 2013 5:16 p.m.
To: solr-user@lucene.apache.org
Subject: Re: OPENNLP problems

Patrick-
I found the problem with multiple documents. The problem was that the 
API for the life cycle of a Tokenizer changed, and I only noticed part 
of the change. You can now upload multiple documents in one post, and 
the OpenNLPTokenizer will process each document.

You're right, the example on the wiki is wrong. The FilterPayloadsFilter 
default is to remove the given payloads, and needs keepPayloads="true" 
to retain them.

The fixed patch is up as LUCENE-2899-x.patch. Again, thanks for trying it.

Lance

https://issues.apache.org/jira/browse/LUCENE-2899

On 05/28/2013 10:08 PM, Patrick Mi wrote:
> Hi there,
>
> Checked out branch_4x and applied the latest patch
> LUCENE-2899-current.patch however I ran into 2 problems
>
> Followed the wiki page instruction and set up a field with this type
aiming
> to keep nouns and verbs and do a facet on the field
> ==
> <fieldType name="text_opennlp_nvf" class="solr.TextField"
> positionIncrementGap="100">
>        <analyzer>
>          <tokenizer class="solr.OpenNLPTokenizerFactory"
> tokenizerModel="opennlp/en-token.bin"/>
>          <filter class="solr.OpenNLPFilterFactory"
> posTaggerModel="opennlp/en-pos-maxent.bin"/>
>          <filter class="solr.FilterPayloadsFilterFactory"
> payloadList="NN,NNS,NNP,NNPS,VB,VBD,VBG,VBN,VBP,VBZ,FW"/>
>          <filter class="solr.StripPayloadsFilterFactory"/>
>        </analyzer>
>      </fieldType>
> ==
>
> Struggled to get that going until I put the extra parameter
> keepPayloads="true" in as below.
>       <filter class="solr.FilterPayloadsFilterFactory" keepPayloads="true"
> payloadList="NN,NNS,NNP,NNPS,VB,VBD,VBG,VBN,VBP,VBZ,FW"/>
>
> Question: am I doing the right thing? Is this a mistake on wiki
>
> Second problem:
>
> Posted the document xml one by one to the solr and the result was what I
> expected.
>
> <add>
> <doc>
>    <field name="id">1</field>
>    <field name="text_opennlp_nvf">check in the hotel</field></doc>
> </add>
>
> However if I put multiple documents into the same xml file and post it in
> one go only the first document gets processed( only 'check' and 'hotel'
were
> showing in the facet result.)
>   
> <add>
> <doc>
>    <field name="id">1</field>
>    <field name="text_opennlp_nvf">check in the hotel</field>
> </doc>
> <doc>
>    <field name="id">2</field>
>    <field name="text_opennlp_nvf">removes the payloads</field>
> </doc>
> <doc>
>    <field name="id">3</field>
>    <field name="text_opennlp_nvf">retains only nouns and verbs </field>
> </doc>
> </add>
>
> Same problem when updated the data using csv upload.
>
> Is that a bug or something I did wrong?
>
> Thanks in advance!
>
> Regards,
> Patrick
>
>

RE: OPENNLP problems

Reply via email to