I'm doing a combination of update processor and token filter. The token filter is necessary to reduce the duplicates after stemming has occurred.
David 2009/6/4 Noble Paul നോബിള് नोब्ळ् <noble.p...@corp.aol.com>: > isn't better to use an UpdateProcessor for this? > > On Thu, Jun 4, 2009 at 1:52 AM, Otis Gospodnetic > <otis_gospodne...@yahoo.com> wrote: >> >> Hello, >> >> It's ugly, but the first thing that came to mind was ThreadLocal. >> >> Otis >> -- >> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch >> >> >> >> ----- Original Message ---- >>> From: David Giffin <da...@giffin.org> >>> To: solr-user@lucene.apache.org >>> Sent: Wednesday, June 3, 2009 1:57:42 PM >>> Subject: Token filter on multivalue field >>> >>> Hi There, >>> >>> I'm working on a unique token filter, to eliminate duplicates on a >>> multivalue field. My filter works properly for a single value field. >>> It seems that a new TokenFilter is created for each value in the >>> multivalue field. I need to maintain an array of used tokens across >>> all of the values in the multivalue field. Is there a good way to do >>> this? Here is my current code: >>> >>> public class UniqueTokenFilter extends TokenFilter { >>> >>> private ArrayList words; >>> public UniqueTokenFilter(TokenStream input) { >>> super(input); >>> this.words = new ArrayList(); >>> } >>> >>> @Override >>> public final Token next(Token in) throws IOException { >>> for (Token token=input.next(in); token!=null; token=input.next()) { >>> if ( !words.contains(token.term()) ) { >>> words.add(token.term()); >>> return token; >>> } >>> } >>> return null; >>> } >>> } >>> >>> Thanks, >>> David >> >> > > > > -- > ----------------------------------------------------- > Noble Paul | Principal Engineer| AOL | http://aol.com >