Re: Issues with whitespace tokenization in QueryParser

2012-06-12 Thread John Berryman
Robert Muir told me that there is somewhat of a workaround for this. For defType=lucene. Just escape every whitespace with a slash. So instead of *new dress shoes* search for *new\ dress\ shoes*. Of course you lose the ability to use normal lucene syntax. I was hoping that this workaround would al

Re: Issues with whitespace tokenization in QueryParser

2012-06-11 Thread Bernd Fehling
Because we use in many cases mutli-term search together with synonyms as thesaurus we had to develop a solution for this. There is a whole chain of pitfalls through the system and you have to be careful. The thesaurus (synonym.txt) solves not only single-terms to multi-terms but also multi-terms t

Issues with whitespace tokenization in QueryParser

2012-06-10 Thread John Berryman
According to https://issues.apache.org/jira/browse/LUCENE-2605, the Lucene QueryParser tokenizes on white space before giving any text to the Analyzer. This makes it impossible to use multi-term synonyms because the SynonymFilter only receives one word at a time. Resolution to this would really he