Hi, I think this kind of text manipulation should be done before indexing, if you have font-size font-family in your text, very likely you’re indexing an html with css. If I’m right, you’re just entering in a hell of words that should be removed from your text.
On the other hand, if you have to do this at index time, a quick and dirty solution is using the pattern-replace filter. https://lucene.apache.org/solr/guide/7_5/filter-descriptions.html#pattern-replace-filter Ciao, Vincenzo -- mobile: 3498513251 skype: free.dev > On 31 Dec 2018, at 02:47, Zheng Lin Edwin Yeo <edwinye...@gmail.com> wrote: > > Hi, > > I noticed that during the indexing of EMLfiles, there are words like > "*FONT-SIZE: > 9pt; FONT-FAMILY: arial*" that are being indexed into the content as well. > > Would like to check, how are we able to remove those words during the > indexing? > > I am using Solr 7.5.0 > > Regards, > Edwin