Specifically, a custome Update Request Processor chain can be used before indexing. Probably with HTMLStripFieldUpdateProcessorFactory Regards, Alex
On Sun, Dec 30, 2018, 9:26 PM Vincenzo D'Amore <v.dam...@gmail.com wrote: > Hi, > > I think this kind of text manipulation should be done before indexing, if > you have font-size font-family in your text, very likely you’re indexing an > html with css. > If I’m right, you’re just entering in a hell of words that should be > removed from your text. > > On the other hand, if you have to do this at index time, a quick and dirty > solution is using the pattern-replace filter. > > > https://lucene.apache.org/solr/guide/7_5/filter-descriptions.html#pattern-replace-filter > > Ciao, > Vincenzo > > -- > mobile: 3498513251 > skype: free.dev > > > On 31 Dec 2018, at 02:47, Zheng Lin Edwin Yeo <edwinye...@gmail.com> > wrote: > > > > Hi, > > > > I noticed that during the indexing of EMLfiles, there are words like > > "*FONT-SIZE: > > 9pt; FONT-FAMILY: arial*" that are being indexed into the content as > well. > > > > Would like to check, how are we able to remove those words during the > > indexing? > > > > I am using Solr 7.5.0 > > > > Regards, > > Edwin >