Hi, I want to create synonyms for a token where I use a regular expression on that token and create a synonym for it with the result.
I tried the following code: public PatternRulesFilter(TokenStream input, Map<Pattern, String> substitutions) { super(input); this.substitutions = substitutions; this.charTermAttr = addAttribute(CharTermAttribute.class); this.posIncAttr = addAttribute(PositionIncrementAttribute.class); this.offsetAttr = addAttribute(OffsetAttribute.class); this.terms = new LinkedList<>(); } @Override public boolean incrementToken() throws IOException { if (!terms.isEmpty()) { String buffer = terms.poll(); charTermAttr.setEmpty(); maxLen = Math.max(maxLen, buffer.length()); charTermAttr.copyBuffer(buffer.toCharArray(), 0, buffer.length()); offsetAttr.setOffset(start, start + buffer.length()); posIncAttr.setPositionIncrement(0); log.info("new attr: {}", String.valueOf(buffer)); return true; } if (input.incrementToken()) { // we add the new substitutions String buffer = String.valueOf(charTermAttr.buffer()).trim(); start = maxLen; maxLen = buffer.length(); terms.addAll(substitutions.entrySet().stream() .filter(e -> e.getKey().matcher(buffer).find()) .map(e -> e.getKey().matcher(buffer).replaceAll(e.getValue())) .collect(Collectors.toSet())); // we return true and leave the original token unchanged return true; } return false; } when I use search terms with 2 or more words the second token is overlapped by substitutions results from the first token. for example if i have a rules x -> sc and the search term 'taxi sun' i get token like: taxi sun tasci sunci Any ideas why? If you know a token filter that already does this I would mind use it at all. Thanx Bruno -- <http://about.me/brunorene> Bruno René Santos about.me/brunorene [image: Bruno René Santos on about.me] <http://about.me/brunorene>