Re: term frequency vector access?

Andrzej Bialecki Thu, 11 Feb 2010 08:31:35 -0800

On 2010-02-11 17:04, Mike Perham wrote:

In an UpdateRequestProcessor (processing an AddUpdateCommand), I have
a SolrInputDocument with a field 'content' that has termVectors="true"
in schema.xml.  Is it possible to get access to that field's term
vector in the URP?

No, term vectors are created much later, during the process of addingthe document to a Lucene index (deep inside Lucene IndexWriter & co).That's the whole point of SOLR-1536 - certain features become availableonly when the tokenization actually occurs.

Another reason to use SOLR-1536 is when tokenization and analysis iscostly, e.g. when doing named entity recognition, POS tagging orlemmatization. Theoretically you could play the TokenizerChain twice -once in URP, so that you can discover and capture features and modifythe input document accordingly, and then again inside Lucene - but inpractice this may be too costly.


--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Re: term frequency vector access?

Reply via email to