The KeywordTokenizer doesn't do anything to break up the input stream,
it just treats the whole input to the field as a single token. So I don't think
you'll be able to "extract" anything starting with that tokenizer.
Look at the admin/analysis page to see a step-by-step breakdown of what
your ana
Erick,
I totally understand that BUT the keyword tokenizer factory does a really
good job extracting phrases (or what look like phrases from) from my data. I
don't know why exactly but it does do it. I am going to continue working
through it to see if I can't figure it out ;-)
Adam
On Thu, Jun 9
The problem here is that none of the built-in filters or tokenizers
have a prayer
of recognizing what #you# think are phrases, since it'll be unique to
your situation.
If you have a list of phrases you care about, you could substitute a
single token
for the phrases you care about...
But the overr