On 2010-02-11 17:04, Mike Perham wrote:
In an UpdateRequestProcessor (processing an AddUpdateCommand), I have
a SolrInputDocument with a field 'content' that has termVectors="true"
in schema.xml.  Is it possible to get access to that field's term
vector in the URP?

No, term vectors are created much later, during the process of adding the document to a Lucene index (deep inside Lucene IndexWriter & co). That's the whole point of SOLR-1536 - certain features become available only when the tokenization actually occurs.

Another reason to use SOLR-1536 is when tokenization and analysis is costly, e.g. when doing named entity recognition, POS tagging or lemmatization. Theoretically you could play the TokenizerChain twice - once in URP, so that you can discover and capture features and modify the input document accordingly, and then again inside Lucene - but in practice this may be too costly.

--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Reply via email to