Re: solr.HTMLStripCharFilterFactory issue

Erik Hatcher Mon, 02 Sep 2019 06:35:19 -0700

Analysis has no effect on the stored (what you get back from fl) value.   The 
html stripping is happening behind the scenes on the indexed/searchable terms.


     Erik

> On Sep 2, 2019, at 09:30, Big Gosh <bigg...@gmail.com> wrote:
> 
> Hi,
> 
> I've configured in solr 8.2.0 a field type as follows:
> 
> <fieldType name="text_html" class="solr.TextField"
> positionIncrementGap="100" multiValued="true">
>      <analyzer type="index">
>        <charFilter class="solr.HTMLStripCharFilterFactory"/>
>        <tokenizer class="solr.StandardTokenizerFactory"/>
>        <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" />
>        <!-- in this example, we will only use synonyms at query time
>        <filter class="solr.SynonymGraphFilterFactory"
> synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
>        <filter class="solr.FlattenGraphFilterFactory"/>
>        -->
>        <filter class="solr.LowerCaseFilterFactory"/>
>      </analyzer>
>      <analyzer type="query">
>        <tokenizer class="solr.StandardTokenizerFactory"/>
>        <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" />
>        <filter class="solr.SynonymGraphFilterFactory"
> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>        <filter class="solr.LowerCaseFilterFactory"/>
>      </analyzer>
>    </fieldType>
> 
> I expected that the search returns the field stripped, instead HTML tags
> are still in the field.
> 
> Is this correct or I made a mistake in configuration
> 
> I'm quite sure in the past I used this approach to strip html from the text
> 
> Thanks in advance

Re: solr.HTMLStripCharFilterFactory issue

Reply via email to