: on indexing these are passed through a synonym filter that has this line
: saturday night live => snl, saturday night live
: i now end up with four tokens
: [saturday, 0, 19], [snl, 0, 19], [night, 0, 19], [live, 0,19]
:
: what i want is
: [saturday, 0,8], [snl, 0,19], [night, 9, 14], [live, 1
hello *, im having issues with the synonym filter altering token offsets,
my input text is
"saturday night live"
its is tokenized by the whitespace tokenizer yielding 3 tokens
[saturday, 0,8], [night, 9, 14], [live, 15,19]
on indexing these are passed through a synonym filter that has this line
s