Re: Solr/Lucene Tokenizers - cannot get the behavior I need

2012-11-17 Thread Shawn Heisey
On 11/16/2012 12:30 PM, Shawn Heisey wrote: I am extremely interested in the Unicode behavior of ICUTokenizer, but I cannot disable the punctuation-splitting behavior and let WDF handle it properly, which causes recall problems. There is no filter that I can run after tokenization, either. Lo

Re: Solr/Lucene Tokenizers - cannot get the behavior I need

2012-11-17 Thread Shawn Heisey
On 11/16/2012 12:52 PM, Shawn Heisey wrote: On 11/16/2012 12:36 PM, Jack Krupansky wrote: Generally, you don't need the preserveOriginal attribute for WDF. Generate both the word parts and the concatenated terms, and queries should work fine without the original. The separated terms will be in

Re: Solr/Lucene Tokenizers - cannot get the behavior I need

2012-11-16 Thread Shawn Heisey
On 11/16/2012 12:36 PM, Jack Krupansky wrote: Generally, you don't need the preserveOriginal attribute for WDF. Generate both the word parts and the concatenated terms, and queries should work fine without the original. The separated terms will be indexed as a sequence, and the split/separated

Re: Solr/Lucene Tokenizers - cannot get the behavior I need

2012-11-16 Thread Jack Krupansky
Generally, you don't need the preserveOriginal attribute for WDF. Generate both the word parts and the concatenated terms, and queries should work fine without the original. The separated terms will be indexed as a sequence, and the split/separated terms will generate a phrase query that matches