Solr _does_ have a query parser that doesn't suffer from this problem -- SimpleQParser chosen as the string "simple". https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-SimpleQueryParser In this case, see the "WHITESPACE" operator feature which can be toggled. Configure to be _not_ an operator so that whitespace is processed by the underlying Analyzer to get proper multi-word handling. This is a very fine query parser, IMO; much simpler than any other that has it's feature set. Though you still might need dismax/edismax.
On Thu, Feb 2, 2017 at 1:17 PM Cliff Dickinson <cliff.dickin...@gmail.com> wrote: > Steve and Shawn, thanks for your replies/explanations! > > I eagerly await the completion of the Solr JIRA ticket referenced above in > a future release. Many thanks for addressing this challenge that has had > me banging my head against my desk off and on for the last couple years! > > Cliff > > On Thu, Feb 2, 2017 at 1:01 PM, Steve Rowe <sar...@gmail.com> wrote: > > > Hi Cliff, > > > > The Solr query parsers (standard/“Lucene” and e/dismax anyway) have a > > problem that prevents SynonymGraphFilter from working: the text fed to > your > > query analyzer is first split on whitespace. So e.g. a query containing > > “United States” will never match multi-word synonym “United > States”->”US”, > > since the analyzer will fist see “United” and then, separately, “States”. > > > > I fixed the whitespace splitting problem in the classic Lucene query > > parser in <https://issues.apache.org/jira/browse/LUCENE-2605>. (Note > > that this is *not* the same as Solr’s standard/“Lucene” query parser, > which > > is actually a fork of Lucene’s query parser with added functionality.) > > > > There is a Solr JIRA I’m working on to fix the whitespace splitting > > problem: <https://issues.apache.org/jira/browse/SOLR-9185>. I hope to > > get it committed in time for inclusion in Solr 6.5. > > > > -- > > Steve > > www.lucidworks.com > > > > > On Feb 2, 2017, at 9:50 AM, Shawn Heisey <apa...@elyograg.org> wrote: > > > > > > On 2/2/2017 7:36 AM, Cliff Dickinson wrote: > > >> The SynonymGraphFilter API documentation contains the following > > statement > > >> at the end: > > >> > > >> "To get fully correct positional queries when your synonym > replacements > > are > > >> multiple tokens, you should instead apply synonyms using this > > TokenFilter > > >> at query time and translate the resulting graph to a > TermAutomatonQuery > > >> e.g. using TokenStreamToTermAutomatonQuery." > > > > > > Lucene is a programming API for search. That documentation is intended > > > for people who are writing Lucene programs. Those users would be > > > constructing query objects in their own code, so they would most likely > > > know exactly which object needs to be changed to TermAutomatonQuery. > > > > > > Solr is a Lucene program ... and an immensely complicated one. Many > > > Lucene improvements require changes in the end program for full > > > support. I suspect that Solr's capability has not been updated to use > > > this new feature in Lucene. I cannot say for sure, I hope someone who > > > is familiar with this Lucene change and Solr internals can comment. > > > > > > Thanks, > > > Shawn > > > > > > > > -- Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker LinkedIn: http://linkedin.com/in/davidwsmiley | Book: http://www.solrenterprisesearchserver.com