Re: Howto concatenate tokens at index time (without spaces)

Batzenmann Wed, 01 Oct 2008 01:16:33 -0700


Otis Gospodnetic wrote:
> 
> I haven't used the German analyzer (either Snowball or the one we have in
> Lucene's contrib), but have you checked if that does the trick of keeping
> words together?
> 
I'm not sure how this can work out with words that are space separated,
especially since we use a whitespacetokenizer first in the filter chain.


I solved the problem for now by applying the follwing filter:

public class ConcatFilter extends TokenFilter {
    private Token _last;
    private Queue<Token> _concatVersions = new LinkedList<Token>(); 

    public ConcatFilter(TokenStream input) {
        super(input);
    }

    @Override
    public Token next() throws IOException {
        final Token next = input.next();
        if ( next != null ) {
            if ( _last != null ) {
                final String concatStr = _last.termText() + next.termText();
                _concatVersions.add(new Token(concatStr, 0,
concatStr.length()));
            }
            _last = next;
            return next;
        } else if ( ! _concatVersions.isEmpty() ) {
            return _concatVersions.poll();
        }
        return null;
    }
}
-- 
View this message in context: 
http://www.nabble.com/Howto-concatenate-tokens-at-index-time-%28without-spaces%29-tp19740271p19756337.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Howto concatenate tokens at index time (without spaces)

Reply via email to