Thank you for your answer, very clear and precise.
On Mon, 2 Sep 2019 at 15:52, Erick Erickson <erickerick...@gmail.com> wrote: > This is expected behavior, assuming you’re asking for your stored field as > part of the “fl” list. > > The default behavior is to store the raw input and return it unaltered. > The stored data is recorded before _any_ analysis, including charFilters. > Otherwise it’d be surprising to see, say, the original text with all the > accents removed (to use another CharFilter as an example). > > If you want the returned text to not include the markup, use an > UpdateProcessorFactory in your update chain. These modify the input before > the data is stored. For instance: > > > https://lucene.apache.org/solr/7_6_0//solr-core/org/apache/solr/update/processor/HTMLStripFieldUpdateProcessorFactory.html > > It’s not obvious from the desctiption unless you follow the link to the > superclass that you can specify one or more fields too, see: > > https://lucene.apache.org/solr/7_6_0//solr-core/org/apache/solr/update/processor/FieldMutatingUpdateProcessorFactory.html > > Best, > Erick > > > On Sep 2, 2019, at 9:30 AM, Big Gosh <bigg...@gmail.com> wrote: > > > > Hi, > > > > I've configured in solr 8.2.0 a field type as follows: > > > > <fieldType name="text_html" class="solr.TextField" > > positionIncrementGap="100" multiValued="true"> > > <analyzer type="index"> > > <charFilter class="solr.HTMLStripCharFilterFactory"/> > > <tokenizer class="solr.StandardTokenizerFactory"/> > > <filter class="solr.StopFilterFactory" ignoreCase="true" > > words="stopwords.txt" /> > > <!-- in this example, we will only use synonyms at query time > > <filter class="solr.SynonymGraphFilterFactory" > > synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/> > > <filter class="solr.FlattenGraphFilterFactory"/> > > --> > > <filter class="solr.LowerCaseFilterFactory"/> > > </analyzer> > > <analyzer type="query"> > > <tokenizer class="solr.StandardTokenizerFactory"/> > > <filter class="solr.StopFilterFactory" ignoreCase="true" > > words="stopwords.txt" /> > > <filter class="solr.SynonymGraphFilterFactory" > > synonyms="synonyms.txt" ignoreCase="true" expand="true"/> > > <filter class="solr.LowerCaseFilterFactory"/> > > </analyzer> > > </fieldType> > > > > I expected that the search returns the field stripped, instead HTML tags > > are still in the field. > > > > Is this correct or I made a mistake in configuration > > > > I'm quite sure in the past I used this approach to strip html from the > text > > > > Thanks in advance > >