> Oh, hang on... If a phrase is defined as multiple tokens, and pf is used for > phrase boosting, does that mean that even with a regular tokenizer the pf > won't work for fields that only contain one word? For example if the title of > one document is "John", and the user searches for 'John' (without any > surrounding phrase-characters), will edismax not boost this document?
Yes, phrase boost “pf” is only applied if the user enters a phrase. Thus q=john will not trigger pf, since there is no phrase to boost. My workaround, however, inserts a special token before and after both the indexed field and the query, so there will always be 3 or more tokens, and pf will kick in. You could use variations of this to have single word queries trigger pf boost for text in a field even if it is not an exact match. But I agree with you that it is not very obvious, and could be better documented. Could perhaps also be useful with a new edismax parameter “pfMinClauseSize” to force pf on single-token without this workaround. But there could be good reasons for the original design choice here, that we don’t know about... -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com > 6. apr. 2016 kl. 11.22 skrev jimi.hulleg...@svensktnaringsliv.se: > > OK, well I'm not sure I agree with you. First of all, you ask me to point my > "pf" towards a tokenized field, but I already do that (the fact that all text > is tokenized into a single token doesn't change that fact). Also, I don't > agree with the view that a single term phrase never is valid/reasonable. In > this specific case, with a KeywordTokenizer, I see it as very reasonable > indeed. And I would consider a "single term keyword phrase" solution more > logical than a workaround using special magical characters inserted in the > text. Just my two cents... :) > > Oh, hang on... If a phrase is defined as multiple tokens, and pf is used for > phrase boosting, does that mean that even with a regular tokenizer the pf > won't work for fields that only contain one word? For example if the title of > one document is "John", and the user searches for 'John' (without any > surrounding phrase-characters), will edismax not boost this document? > > /Jimi > > -----Original Message----- > From: Jan Høydahl [mailto:jan....@cominvent.com] > Sent: Wednesday, April 6, 2016 10:43 AM > To: solr-user@lucene.apache.org > Subject: Re: Can't get phrase field boosting to work using edismax > > Hi, > > Phrase match via “pf” requires the target field to contain a phrase. A phrase > is defined as multiple tokens. Yours does not contain a phrase since you use > the KeywordTokenizer, leaving only one token in the field. eDismax pf will > thus never kick in. Please point your “pf” towards a tokenized field. > > If what you are trying to achieve is to boost only when the whole query > exactly matches the full content of the field, then have a look at my > solution here https://github.com/cominvent/exactmatch > > -- > Jan Høydahl, search solution architect > Cominvent AS - www.cominvent.com > >> 5. apr. 2016 kl. 19.10 skrev jimi.hulleg...@svensktnaringsliv.se: >> >> Some more input, before I call it a day. Just for the heck of it, I tried >> changing minClauseSize to 0 using the Eclipse debugger, so that it didn't >> return null at line 1203, but instead returned the TermQuery on line 1205. >> Then everything worked exactly as it should. The matching document got >> boosted as expected. And in the explain output, this can be seen: >> >> [...] >> 11.274228 = (MATCH) weight(exactTitle:some words^100.0 in 172) >> [DefaultSimilarity], result of: >> [...] >> >> So. In my case, having minClauseSize=2 on line 550 (line 565 for solr 5.5.0) >> is the culprit. Is this a bug, or am I using the pf in the wrong way? Can >> someone explain why minClauseSize can't be set to 0 here? The comment simply >> states "we need at least two or there shouldn't be a boost", but no >> explaination *why* at least two is needed. >> >> Regards >> /Jimi >> >> -----Original Message----- >> From: jimi.hulleg...@svensktnaringsliv.se >> [mailto:jimi.hulleg...@svensktnaringsliv.se] >> Sent: Tuesday, April 5, 2016 6:51 PM >> To: solr-user@lucene.apache.org >> Subject: RE: Can't get phrase field boosting to work using edismax >> >> I now used the Eclipse debugger, to try and see if I can understand what is >> happening, I it seems like the ExtendedDismaxQParser simply ignores my pf >> parameter, since it doesn't interpret it as a phrase query. >> >> https://github.com/apache/lucene-solr/blob/releases/lucene-solr/4.6.0/ >> solr/core/src/java/org/apache/solr/search/ExtendedDismaxQParser.java >> >> On line 1180 I get a query object of type TermQuery (with the term >> "exactTitle:some words"). And in the if statements starting at line it is >> quite clear that if it is not a PhraseQuery or a MultiPhraseQuery, or if the >> minClauseSize > 1 (and it is set to 2 on line 550) the method simply returns >> null (ie ignoring my pf parameter). Why is this happening? >> >> I use Solr 4.6 by the way... I forgot to mention that in my original message. >> >> >> -----Original Message----- >> From: jimi.hulleg...@svensktnaringsliv.se >> [mailto:jimi.hulleg...@svensktnaringsliv.se] >> Sent: Tuesday, April 5, 2016 5:36 PM >> To: solr-user@lucene.apache.org >> Subject: RE: Can't get phrase field boosting to work using edismax >> >> OK. Interesting. But... I added a solr.TrimFilterFactory at the end of my >> analyzer definition. Shouldn't that take care of the added space at the end? >> The admin analysis page indicates that it works as it should, but I still >> can't get edismax to boost. >> >> -----Original Message----- >> From: Jack Krupansky [mailto:jack.krupan...@gmail.com] >> Sent: Tuesday, April 5, 2016 4:42 PM >> To: solr-user@lucene.apache.org >> Subject: Re: Can't get phrase field boosting to work using edismax >> >> It looks like the code constructing the boost phrase for pf will always add >> a trailing blank, which is never a problem when a normal tokenizer is used >> that removes white space, but the keyword tokenizer will preserve that extra >> space, which prevents an exact match. >> >> See line 531: >> https://github.com/apache/lucene-solr/blob/releases/lucene-solr/5.5.0/ >> solr/core/src/java/org/apache/solr/search/ExtendedDismaxQParser.java >> >> I'd say it's a bug, but more a narrow use case that wasn't considered or >> tested. >> >> -- Jack Krupansky >> >> On Tue, Apr 5, 2016 at 7:50 AM, <jimi.hulleg...@svensktnaringsliv.se> wrote: >> >>> Hi, >>> >>> I'm trying to boost documents using a phrase field boosting (ie the >>> pf parameter for edismax), but I can't get it to work (ie boosting >>> documents where the pf field match the query as a phrase). >>> >>> As far as I can tell, solr, or more specifically the edismax handler, >>> does >>> *something* when I add this parameter. I know this because the QTime >>> increases from around 5-10ms to around 30-40 ms, and the score >>> explain structure is *slightly* modified (though with the same final >>> score for all documents). But nowhere in the explain structure can I >>> see anything about the pf. And I can't understand that. Shouldn't it >>> be included in the explain? If not, is there any way to force it to be >>> included somehow? >>> >>> The query looks something like this: >>> >>> >>> ?q=some+words&rows=10&sort=score+desc&debugQuery=true&fl=objectid,exa >>> c >>> tTitle,score%2C%5Bexplain+style%3Dtext%5D&qf=title%5E2&qf=swedishText >>> 1 %5E1&defType=edismax&pf=exactTitle%5E5&wt=xml&indent=true >>> >>> >>> I have one document that has the title "some words", and when I do a >>> simple query filter with exactTitle:"some words" I get a match for >>> that document. So then I would expect that the query above would >>> boost this document, and include information about this in the >>> explain. But nothing like this happens, and I can't understand why. >>> >>> The field looks like this: >>> >>> <field name="exactTitle" type="keywordText" indexed="true" stored="true" >>> required="false" multiValued="false" /> >>> >>> And the fieldType looks like this: >>> >>> <fieldType name="keywordText" class="solr.TextField" >>> positionIncrementGap="100"> >>> <analyzer> >>> <charFilter >>> class="solr.HTMLStripCharFilterFactory" /> >>> <tokenizer >>> class="solr.KeywordTokenizerFactory" /> >>> <filter >>> class="solr.LowerCaseFilterFactory" /> >>> </analyzer> >>> </fieldType> >>> >>> >>> I have also tried boosting this document using a boost query, ie >>> bq=exactTitle:"some words", and this works as expected. The document >>> score is boosted, and the explain states this very clearly, with this >>> segment: >>> >>> [...] >>> 9.870669 = (MATCH) weight(exactTitle:some words^5.0 in 12) >>> [DefaultSimilarity], result of: >>> [...] >>> >>> Why is this working, but q=some+words&pf=exactTitle^5 not? Shouldn't >>> edismax rewrite my "pf query" into something very similar to the "bq query"? >>> >>> Regards >>> /Jimi >>> >