Re: [Free Text] Field Tokenizing

2011-06-09 Thread Erick Erickson
The KeywordTokenizer doesn't do anything to break up the input stream, it just treats the whole input to the field as a single token. So I don't think you'll be able to "extract" anything starting with that tokenizer. Look at the admin/analysis page to see a step-by-step breakdown of what your ana

Re: [Free Text] Field Tokenizing

2011-06-09 Thread Adam Estrada
Erick, I totally understand that BUT the keyword tokenizer factory does a really good job extracting phrases (or what look like phrases from) from my data. I don't know why exactly but it does do it. I am going to continue working through it to see if I can't figure it out ;-) Adam On Thu, Jun 9

Re: [Free Text] Field Tokenizing

2011-06-09 Thread Erick Erickson
The problem here is that none of the built-in filters or tokenizers have a prayer of recognizing what #you# think are phrases, since it'll be unique to your situation. If you have a list of phrases you care about, you could substitute a single token for the phrases you care about... But the overr