Hi,

I want to create synonyms for a token where I use a regular expression on
that token and create a synonym for it with the result.

I tried the following code:


public PatternRulesFilter(TokenStream input, Map<Pattern, String>
substitutions)
{
super(input);
this.substitutions = substitutions;
this.charTermAttr = addAttribute(CharTermAttribute.class);
this.posIncAttr = addAttribute(PositionIncrementAttribute.class);
this.offsetAttr = addAttribute(OffsetAttribute.class);
this.terms = new LinkedList<>();
}

@Override
public boolean incrementToken() throws IOException
{
if (!terms.isEmpty())
{
String buffer = terms.poll();
charTermAttr.setEmpty();
maxLen = Math.max(maxLen, buffer.length());
charTermAttr.copyBuffer(buffer.toCharArray(), 0, buffer.length());
offsetAttr.setOffset(start, start + buffer.length());
posIncAttr.setPositionIncrement(0);
log.info("new attr: {}", String.valueOf(buffer));
return true;
}
if (input.incrementToken())
{
// we add the new substitutions
String buffer = String.valueOf(charTermAttr.buffer()).trim();
start = maxLen;
maxLen = buffer.length();
terms.addAll(substitutions.entrySet().stream()
.filter(e -> e.getKey().matcher(buffer).find())
.map(e -> e.getKey().matcher(buffer).replaceAll(e.getValue()))
.collect(Collectors.toSet()));
// we return true and leave the original token unchanged
return true;
}
return false;
}

when I use search terms with 2 or more words the second token is overlapped
by substitutions results from the first token. for example if i have a
rules x -> sc and the search term 'taxi sun' i get token like:

taxi sun
tasci sunci

Any ideas why? If you know a token filter that already does this I would
mind use it at all.

Thanx
Bruno

-- 
<http://about.me/brunorene>
Bruno René Santos
about.me/brunorene
[image: Bruno René Santos on about.me]
  <http://about.me/brunorene>

Reply via email to