Will adding the RemoveDuplicatesTokenFilter(Factory) do the trick here?
Erik On Apr 2, 2010, at 4:13 PM, Joe Calderon wrote:
hello *, i have a field that is indexing the string "the ex-girlfriend" as these tokens: [the, exgirlfriend, ex, girlfriend] then they are passed to the edgengram filter, this allows me to match different user spellings and allows for partial highlighting, however a token like 'ex' would get generated twice which should be fine except the highlighter seems to highlight that token twice even though it has the same offsets (4,6) is there away to make the highlighter not highlight the same token twice, or do i have to create a token filter that would dump tokens with equal text and offsets ? basically whats happening now is if i search 'the e', i get: '<em>Seinfeld</em>The <em>E</em><em>E</em>x-Girlfriend' for 'the ex', i get: '<em>Seinfeld</em>The <em>Ex</em><em>Ex</em>-Girlfriend' and so on thx much --joe