Hi Doug, nice write-up and 2 questions: - You write your own QParser plugins - can one keep the features of edismax for field boosting/phrase-match boosting by subclassing edismax? Assuming yes...
- What do pf2 and pf3 do in the edismax query parser? hon-lucene-synonyms plugin links corrections: http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/ https://github.com/healthonnet/hon-lucene-synonyms On Wed, Apr 29, 2015 at 9:24 PM, Doug Turnbull < dturnb...@opensourceconnections.com> wrote: > So Solr has the idea of a query parser. The query parser is a convenient > way of passing a search string to Solr and having Solr parse it into > underlying Lucene queries: You can see a list of query parsers here > http://wiki.apache.org/solr/QueryParser > > What this means is that the query parser does work to pull terms into > individual clauses *before* analysis is run. It's a parsing layer that sits > outside the analysis chain. This creates problems like the "sea biscuit" > problem, whereby we declare "sea biscuit" as a query time synonym of > "seabiscuit". As you may know synonyms are checked during analysis. > However, if the query parser splits up "sea" from "biscuit" before running > analysis, the query time analyzer will fail. The string "sea" is brought by > itself to the query time analyzer and of course won't match "sea biscuit". > Same with the string "biscuit" in isolation. If the full string "sea > biscuit" was brought to the analyzer, it would see [sea] next to [biscuit] > and declare it a synonym of seabiscuit. Thanks to the query parser, the > analyzer has lost the association between the terms, and both terms aren't > brought together to the analyzer. > > My colleague John Berryman wrote a pretty good blog post on this > > http://opensourceconnections.com/blog/2013/10/27/why-is-multi-term-synonyms-so-hard-in-solr/ > > There's several solutions out there that attempt to address this problem. > One from Ted Sullivan at Lucidworks > > https://lucidworks.com/blog/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/ > > Another popular one is the hon-lucene-synonyms plugin: > > http://lucene.apache.org/solr/5_1_0/solr-core/org/apache/solr/search/FieldQParserPlugin.html > > Yet another work-around is to use the field query parser: > > http://lucene.apache.org/solr/5_1_0/solr-core/org/apache/solr/search/FieldQParserPlugin.html > > I also tend to write my own query parsers, so on the one hand its annoying > that query parsers have the problems above, on the flipside Solr makes it > very easy to implement whatever parsing you think is appropriatte with a > small bit of Java/Lucene knowledge. > > Hopefully that explanation wasn't too deep, but its an important thing to > know about Solr. Are you asking out of curiosity, or do you have a specific > problem? > > Thanks > -Doug > > On Wed, Apr 29, 2015 at 6:32 PM, Steven White <swhite4...@gmail.com> > wrote: > > > Hi Doug, > > > > I don't understand what you mean by the following: > > > > > For example, if a user searches for q=hot dogs&defType=edismax&qf=title > > > body the *query parser* *not* the *analyzer* first turns the query > into: > > > > If I have indexAnalyzer and queryAnalyzer in a fieldType that are 100% > > identical, the example you provided, does it stand? If so, why? Or do > you > > mean something totally different by "query parser"? > > > > Thanks > > > > Steve > > > > > > On Wed, Apr 29, 2015 at 4:18 PM, Doug Turnbull < > > dturnb...@opensourceconnections.com> wrote: > > > > > *> 1) If the content of indexAnalyzer and queryAnalyzer are exactly the > > > same,that's the same as if I have an analyzer only, right?* > > > 1) Yes > > > > > > *> 2) Under the hood, all three are the same thing when it comes to > what > > > kind* > > > *of data and configuration attributes can take, right?* > > > 2) Yes. Both take in text and output a token stream. > > > > > > *>What I'm trying to figure out is this: beside being able to configure > > a* > > > > > > *fieldType to have different analyzer setting at index and query time, > > > thereis nothing else that's unique about each.* > > > > > > The only thing to look out for in Solr land is the query parser. Most > > Solr > > > query parsers treat whitespace as meaningful. > > > > > > For example, if a user searches for q=hot dogs&defType=edismax&qf=title > > > body the *query parser* *not* the *analyzer* first turns the query > into: > > > > > > (title:hot title:dog) | (body:hot body:dog) > > > > > > each word which *then *gets analyzed. This is because the query parser > > > tries to be smart and turn "hot dog" into hot OR dog, or more > > specifically > > > making them two must clauses. > > > > > > This trips quite a few folks up, you can use the field query parser > which > > > uses the field as a phrase query. Hope that helps > > > > > > > > > -- > > > *Doug Turnbull **| *Search Relevance Consultant | OpenSource > Connections, > > > LLC | 240.476.9983 | http://www.opensourceconnections.com > > > Author: Taming Search <http://manning.com/turnbull> from Manning > > > Publications > > > This e-mail and all contents, including attachments, is considered to > be > > > Company Confidential unless explicitly stated otherwise, regardless > > > of whether attachments are marked as such. > > > On Wed, Apr 29, 2015 at 3:41 PM, Steven White <swhite4...@gmail.com> > > > wrote: > > > > > > > Hi Everyone, > > > > > > > > Looking at Solr's schema.xml, there are three kind of analyzers: > > > analyzer, > > > > indexAnalyzer and queryAnalyzer. I have two questions about them: > > > > > > > > 1) If the content of indexAnalyzer and queryAnalyzer are exactly the > > > same, > > > > that's the same as if I have an analyzer only, right? > > > > > > > > 2) Under the hood, all three are the same thing when it comes to what > > > kind > > > > of data and configuration attributes can take, right? > > > > > > > > What I'm trying to figure out is this: beside being able to > configure a > > > > fieldType to have different analyzer setting at index and query time, > > > there > > > > is nothing else that's unique about each. > > > > > > > > Thanks > > > > > > > > Steve > > > > > > > > > > > > > -- > *Doug Turnbull **| *Search Relevance Consultant | OpenSource Connections, > LLC | 240.476.9983 | http://www.opensourceconnections.com > Author: Taming Search <http://manning.com/turnbull> from Manning > Publications > This e-mail and all contents, including attachments, is considered to be > Company Confidential unless explicitly stated otherwise, regardless > of whether attachments are marked as such. >