On Sun, 29 Jun 2008 19:40:44 -0300
"Hugo Barauna" <[EMAIL PROTECTED]> wrote:

> I am having problems with a stored field. The problem is that field is not
> being stored as I need it to be. It has a tokenizer
> class="solr.HTMLStripWhitespaceTokenizerFactory", but when it is stored,
> that tokenizer is not applied. That tokenizer is only applied for the
> inverted index of that field.

Hi Hugo
The tokenizers  + filters are applied at 2 points in the process:
  - index time : they works on the original text to generate the index.
  - query time : working on the query sent, modifying it using similar steps as
those used at index time to ensure matching tokens.

As you can see, storing doesn't come into the equation - 'stored' means simply
that, store what you sent without changes.

> How can apply tokenizers, analyzers and other filters to the stored field?

you'll have to modify it yourself before sending to SOLR. ie, 1) send your raw
HTML to the index to a field that gets processed with the filters you define,
2) clean up your html in the field and store it in a text field .. (all in the
one document, of course).

If you are using Java at your document preparation stage, you could just borrow
the code from the HTMLStripWhiteSpaceTokenizer to perform step 2 ;)

I wonder whether it makes sense to allow tokenizers / filters to apply to
stored text too from a config option..

B
_________________________
{Beto|Norberto|Numard} Meijome

"The only difference between the saint and the sinner is that every saint has a
past and every sinner has a future." Oscar Wilde

I speak for myself, not my employer. Contents may be hot. Slippery when wet.
Reading disclaimers makes you go blind. Writing them is worse. You have been
Warned.

Reply via email to