Hi Lance, I updated the src from 4.x and applied the latest patch LUCENE-2899-x.patch uploaded on 6th June but still had the same problem.
Regards, Patrick -----Original Message----- From: Lance Norskog [mailto:goks...@gmail.com] Sent: Thursday, 6 June 2013 5:16 p.m. To: solr-user@lucene.apache.org Subject: Re: OPENNLP problems Patrick- I found the problem with multiple documents. The problem was that the API for the life cycle of a Tokenizer changed, and I only noticed part of the change. You can now upload multiple documents in one post, and the OpenNLPTokenizer will process each document. You're right, the example on the wiki is wrong. The FilterPayloadsFilter default is to remove the given payloads, and needs keepPayloads="true" to retain them. The fixed patch is up as LUCENE-2899-x.patch. Again, thanks for trying it. Lance https://issues.apache.org/jira/browse/LUCENE-2899 On 05/28/2013 10:08 PM, Patrick Mi wrote: > Hi there, > > Checked out branch_4x and applied the latest patch > LUCENE-2899-current.patch however I ran into 2 problems > > Followed the wiki page instruction and set up a field with this type aiming > to keep nouns and verbs and do a facet on the field > == > <fieldType name="text_opennlp_nvf" class="solr.TextField" > positionIncrementGap="100"> > <analyzer> > <tokenizer class="solr.OpenNLPTokenizerFactory" > tokenizerModel="opennlp/en-token.bin"/> > <filter class="solr.OpenNLPFilterFactory" > posTaggerModel="opennlp/en-pos-maxent.bin"/> > <filter class="solr.FilterPayloadsFilterFactory" > payloadList="NN,NNS,NNP,NNPS,VB,VBD,VBG,VBN,VBP,VBZ,FW"/> > <filter class="solr.StripPayloadsFilterFactory"/> > </analyzer> > </fieldType> > == > > Struggled to get that going until I put the extra parameter > keepPayloads="true" in as below. > <filter class="solr.FilterPayloadsFilterFactory" keepPayloads="true" > payloadList="NN,NNS,NNP,NNPS,VB,VBD,VBG,VBN,VBP,VBZ,FW"/> > > Question: am I doing the right thing? Is this a mistake on wiki > > Second problem: > > Posted the document xml one by one to the solr and the result was what I > expected. > > <add> > <doc> > <field name="id">1</field> > <field name="text_opennlp_nvf">check in the hotel</field></doc> > </add> > > However if I put multiple documents into the same xml file and post it in > one go only the first document gets processed( only 'check' and 'hotel' were > showing in the facet result.) > > <add> > <doc> > <field name="id">1</field> > <field name="text_opennlp_nvf">check in the hotel</field> > </doc> > <doc> > <field name="id">2</field> > <field name="text_opennlp_nvf">removes the payloads</field> > </doc> > <doc> > <field name="id">3</field> > <field name="text_opennlp_nvf">retains only nouns and verbs </field> > </doc> > </add> > > Same problem when updated the data using csv upload. > > Is that a bug or something I did wrong? > > Thanks in advance! > > Regards, > Patrick > >