On 2010-02-11 17:04, Mike Perham wrote:
In an UpdateRequestProcessor (processing an AddUpdateCommand), I have a SolrInputDocument with a field 'content' that has termVectors="true" in schema.xml. Is it possible to get access to that field's term vector in the URP?
No, term vectors are created much later, during the process of adding the document to a Lucene index (deep inside Lucene IndexWriter & co). That's the whole point of SOLR-1536 - certain features become available only when the tokenization actually occurs.
Another reason to use SOLR-1536 is when tokenization and analysis is costly, e.g. when doing named entity recognition, POS tagging or lemmatization. Theoretically you could play the TokenizerChain twice - once in URP, so that you can discover and capture features and modify the input document accordingly, and then again inside Lucene - but in practice this may be too costly.
-- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com