This is expected behavior, assuming you’re asking for your stored field as part of the “fl” list.
The default behavior is to store the raw input and return it unaltered. The stored data is recorded before _any_ analysis, including charFilters. Otherwise it’d be surprising to see, say, the original text with all the accents removed (to use another CharFilter as an example). If you want the returned text to not include the markup, use an UpdateProcessorFactory in your update chain. These modify the input before the data is stored. For instance: https://lucene.apache.org/solr/7_6_0//solr-core/org/apache/solr/update/processor/HTMLStripFieldUpdateProcessorFactory.html It’s not obvious from the desctiption unless you follow the link to the superclass that you can specify one or more fields too, see: https://lucene.apache.org/solr/7_6_0//solr-core/org/apache/solr/update/processor/FieldMutatingUpdateProcessorFactory.html Best, Erick > On Sep 2, 2019, at 9:30 AM, Big Gosh <bigg...@gmail.com> wrote: > > Hi, > > I've configured in solr 8.2.0 a field type as follows: > > <fieldType name="text_html" class="solr.TextField" > positionIncrementGap="100" multiValued="true"> > <analyzer type="index"> > <charFilter class="solr.HTMLStripCharFilterFactory"/> > <tokenizer class="solr.StandardTokenizerFactory"/> > <filter class="solr.StopFilterFactory" ignoreCase="true" > words="stopwords.txt" /> > <!-- in this example, we will only use synonyms at query time > <filter class="solr.SynonymGraphFilterFactory" > synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/> > <filter class="solr.FlattenGraphFilterFactory"/> > --> > <filter class="solr.LowerCaseFilterFactory"/> > </analyzer> > <analyzer type="query"> > <tokenizer class="solr.StandardTokenizerFactory"/> > <filter class="solr.StopFilterFactory" ignoreCase="true" > words="stopwords.txt" /> > <filter class="solr.SynonymGraphFilterFactory" > synonyms="synonyms.txt" ignoreCase="true" expand="true"/> > <filter class="solr.LowerCaseFilterFactory"/> > </analyzer> > </fieldType> > > I expected that the search returns the field stripped, instead HTML tags > are still in the field. > > Is this correct or I made a mistake in configuration > > I'm quite sure in the past I used this approach to strip html from the text > > Thanks in advance