On 11-Jun-07, at 3:54 AM, Thierry Collogne wrote:
Ok. Is it possible to get back the content without the html tags?
Well, it isn't stored anywhere in Solr. It's best to think of lucene/
solr as two systems: the indexer applies a tokenization
transformation to the data and creates an inverted index; the storage
system keeps track of the data you give it _before_ analysis/
tokenization. If there is analysis you'd like to do that also
applies to the stored status of the doc, it's probably easier to
apply it before passing the data to Solr.
-MIke
On 08/06/07, Yonik Seeley <[EMAIL PROTECTED]> wrote:
On 6/8/07, Thierry Collogne <[EMAIL PROTECTED]> wrote:
> I am trying to use the solr.HTMLStripWhitespaceTokenizerFactory
analyzer
> with no luck.
[...]
> Is this normal? Shouldn't the html code and the white spaces be
removed
from
> the field?
For indexing purposes, yes. The stored field you get back will be
unchanged though.
If you want to see what will be indexed, try the analysis debugger in
the admin pages.
-Yonik