On Tue, May 26, 2015, at 02:20 PM, Zheng Lin Edwin Yeo wrote:
> Hi,
> 
> Is there a way to remove the special characters like \n during indexing
> of
> the rich text documents.
> 
> I have quite alot of leading \n \n in front of my indexed content of rich
> text documents due to the space and empty lines with the original
> documents, and it's causing the content to be flooded with '\n \n' at the
> start before the actual content comes in. This causes the content to look
> ugly, and also takes up unnecessary bandwidth in the system.

Where is this showing up?

If it is in search results, you must use an UpdateProcessor, as these
happen before fields are stored (E.g. RegexpReplaceProcessorFactory). 

If you are concerned about facet results, then you can do it in an
analysis chain, for example with a RegexpFilterFactory.

Upayavira

Reply via email to