On 11-Jun-07, at 3:54 AM, Thierry Collogne wrote:

Ok. Is it possible to get back the content without the html tags?


Well, it isn't stored anywhere in Solr. It's best to think of lucene/ solr as two systems: the indexer applies a tokenization transformation to the data and creates an inverted index; the storage system keeps track of the data you give it _before_ analysis/ tokenization. If there is analysis you'd like to do that also applies to the stored status of the doc, it's probably easier to apply it before passing the data to Solr.

-MIke

On 08/06/07, Yonik Seeley <[EMAIL PROTECTED]> wrote:

On 6/8/07, Thierry Collogne <[EMAIL PROTECTED]> wrote:
> I am trying to use the solr.HTMLStripWhitespaceTokenizerFactory analyzer
> with no luck.
[...]
> Is this normal? Shouldn't the html code and the white spaces be removed
from
> the field?

For indexing purposes, yes.  The stored field you get back will be
unchanged though.
If you want to see what will be indexed, try the analysis debugger in
the admin pages.

-Yonik


Reply via email to