The org.apache.solr.analysis.RemoveDuplicatesTokenFilter, as per its
description, "Filters out any tokens which are at the same logical position in
the tokenstream as a previous token with the same text."
A very useful filter would be one which filters out duplicate tokens throughout
the field, irrespective of the logical position of the token. Does something
like this exist already or is being planned to be included in the coming
releases?
I have an implementation of this in one of my project and can contribute if the
community finds it useful as well.
Best,Varun