I think it might be to do with the library itself I downloaded semanticvectors-1.22 and compiled from source. Then created a demo corpus using java org.apache.lucene.demo.IndexFiles against the lucene src directory I then ran a java pitt.search.semanticvectors.BuildIndex against the index and got the following
Seedlength = 10 Dimension = 200 Minimum frequency = 0 Number non-alphabet characters = 0 Contents fields are: [contents] Creating semantic term vectors ... Populating basic sparse doc vector store, number of vectors: 774 Creating store of sparse vectors ... Created 774 sparse random vectors. Creating term vectors ... There are 36881 terms (and 774 docs) 0 ... 1000 ... 2000 ... 3000 ... 4000 ... Exception in thread "main" java.lang.NullPointerException at org.apache.lucene.index.DirectoryReader$MultiTermDocs.freq(DirectoryReader.java: 1068) at pitt.search.semanticvectors.LuceneUtils.getGlobalTermFreq(LuceneUtils.java:70) at pitt.search.semanticvectors.LuceneUtils.termFilter(LuceneUtils.java:187) at pitt.search.semanticvectors.TermVectorsFromLucene.<init>(TermVectorsFromLucene.j ava:163) at pitt.search.semanticvectors.BuildIndex.main(BuildIndex.java:138) I am still digging but when you look at the source code it references lucene call dating back to lucene 2.4 alot fo which are deprecated might need some refreshing. Cheers, Dave On 02 November 2009 at 14:40 Andrew Clegg <andrew.cl...@gmail.com> wrote: > > Hi, > > I've recently added the TermVectorComponent as a separate handler, following > the example in the supplied config file, i.e.: > > <searchComponent name="tvComponent" > class="org.apache.solr.handler.component.TermVectorComponent"/> > > <requestHandler name="/tvrh" > class="org.apache.solr.handler.component.SearchHandler"> > <lst name="defaults"> > <bool name="tv">true</bool> > </lst> > <arr name="last-components"> > <str>tvComponent</str> > </arr> > </requestHandler> > > It works, but with one quirk. When you use tf.all=true, you get the tf*idf > scores in the output, just fine (along with tf and df). But if you use > tv.tf_idf=true you get an NPE: > > http://server:8080/solr/tvrh/?q=1cuk&version=2.2&indent=on&tv.tf_idf=true > > HTTP Status 500 - null java.lang.NullPointerException at > org.apache.solr.handler.component.TermVectorComponent$TVMapper.getDocFreq(Term > VectorComponent.java:253) > at > org.apache.solr.handler.component.TermVectorComponent$TVMapper.map(TermVectorC > omponent.java:245) > at > org.apache.lucene.index.TermVectorsReader.readTermVector(TermVectorsReader.jav > a:522) > at > org.apache.lucene.index.TermVectorsReader.readTermVectors(TermVectorsReader.ja > va:401) > at org.apache.lucene.index.TermVectorsReader.get(TermVectorsReader.java:378) > at > org.apache.lucene.index.SegmentReader.getTermFreqVector(SegmentReader.java:125 > 3) > at > org.apache.lucene.index.DirectoryReader.getTermFreqVector(DirectoryReader.java > :474) > at > org.apache.solr.search.SolrIndexReader.getTermFreqVector(SolrIndexReader.java: > 244) > at > org.apache.solr.handler.component.TermVectorComponent.process(TermVectorCompon > ent.java:125) > at > org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandle > r.java:195) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.ja > va:131) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338 > ) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:24 > 1) > at > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFi > lterChain.java:235) > at > (etc.) > > Is this a bug, or am I doing it wrong? > > Cheers, > > Andrew. > > -- > View this message in context: > http://old.nabble.com/NullPointerException-with-TermVectorComponent-tp26156903p26156903.html > Sent from the Solr - User mailing list archive at Nabble.com. >