Specifically, a custome Update Request Processor chain can be used before
indexing. Probably with HTMLStripFieldUpdateProcessorFactory
Regards,
     Alex

On Sun, Dec 30, 2018, 9:26 PM Vincenzo D'Amore <v.dam...@gmail.com wrote:

> Hi,
>
> I think this kind of text manipulation should be done before indexing, if
> you have font-size font-family in your text, very likely you’re indexing an
> html with css.
> If I’m right, you’re just entering in a hell of words that should be
> removed from your text.
>
> On the other hand, if you have to do this at index time, a quick and dirty
> solution is using the pattern-replace filter.
>
>
> https://lucene.apache.org/solr/guide/7_5/filter-descriptions.html#pattern-replace-filter
>
> Ciao,
> Vincenzo
>
> --
> mobile: 3498513251
> skype: free.dev
>
> > On 31 Dec 2018, at 02:47, Zheng Lin Edwin Yeo <edwinye...@gmail.com>
> wrote:
> >
> > Hi,
> >
> > I noticed that during the indexing of EMLfiles, there are words like
> > "*FONT-SIZE:
> > 9pt; FONT-FAMILY: arial*" that are being indexed into the content as
> well.
> >
> > Would like to check, how are we able to remove those words during the
> > indexing?
> >
> > I am using Solr 7.5.0
> >
> > Regards,
> > Edwin
>

Reply via email to