On Sun, 29 Jun 2008 19:40:44 -0300 "Hugo Barauna" <[EMAIL PROTECTED]> wrote:
> I am having problems with a stored field. The problem is that field is not > being stored as I need it to be. It has a tokenizer > class="solr.HTMLStripWhitespaceTokenizerFactory", but when it is stored, > that tokenizer is not applied. That tokenizer is only applied for the > inverted index of that field. Hi Hugo The tokenizers + filters are applied at 2 points in the process: - index time : they works on the original text to generate the index. - query time : working on the query sent, modifying it using similar steps as those used at index time to ensure matching tokens. As you can see, storing doesn't come into the equation - 'stored' means simply that, store what you sent without changes. > How can apply tokenizers, analyzers and other filters to the stored field? you'll have to modify it yourself before sending to SOLR. ie, 1) send your raw HTML to the index to a field that gets processed with the filters you define, 2) clean up your html in the field and store it in a text field .. (all in the one document, of course). If you are using Java at your document preparation stage, you could just borrow the code from the HTMLStripWhiteSpaceTokenizer to perform step 2 ;) I wonder whether it makes sense to allow tokenizers / filters to apply to stored text too from a config option.. B _________________________ {Beto|Norberto|Numard} Meijome "The only difference between the saint and the sinner is that every saint has a past and every sinner has a future." Oscar Wilde I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.