Let me see if I understood your problem: By your first e-mail I think you are worried about the returned order of documents from Solr. Is that correct? If yes, as I said before it's not only the boosting that influence the order of returned documents. There's term frequency, IDF(inverse document frequency)... If I understood correctly by your first e-mail, you are interested in get rid of IDF. So for that, you can create a NoIDFSimilarity class to override the default similarity.
Can you paste here the score calculation for one document? On Wed, Jan 30, 2013 at 2:06 PM, Sandeep Mestry <sanmes...@gmail.com> wrote: > (Sorry for in complete reply in my previous mail, didn't know Ctrl F sends > an email in Gmail.. ;-)) > > Thanks Felipe, yes I have seen that and my requirement falls for > > How can I make exact-case matches score higher > > Example: a query of "Penguin" should score documents containing "Penguin" > higher than docs containing "penguin". > > The general strategy is to index the content twice, using different fields > with different fieldTypes (and different analyzers associated with those > fieldTypes). One analyzer will contain a lowercase filter for > case-insensitive matches, and one will preserve case for exact-case > matches. > > Use copyField <http://wiki.apache.org/solr/SchemaXml#copyField> commands > in > the schema to index a single input field multiple times. > > Once the content is indexed into multiple fields that are analyzed > differently, query across both > fields<http://wiki.apache.org/solr/SolrRelevancyFAQ#multiFieldQuery> > . > > I have added a case insensitive field too to match the exact matches > higher, however the result is not even considering the matches in field - > forget the exact matching part. > > And I have tried the debugQuery option as mentioned in my previous mail, > and I have also posted the parsed queries. From the debug query, I see that > field boosted with lesser factor (contribution) is still resulting higher > than the one with higher boost factor (series_title). > > > Thanks, > > Sandeep > > > > > On 30 January 2013 16:02, Sandeep Mestry <sanmes...@gmail.com> wrote: > > > Thanks Felipe, yes I have seen that and my requirement somewhere falls > for > > > > > > On 30 January 2013 15:53, Felipe Lahti <fla...@thoughtworks.com> wrote: > > > >> Hi Sandeep, > >> > >> Quick answer is that not only the boost that you define in your > >> requestHandler is taken to calculate the score of each document. There > are > >> others factors that contribute to score calculation. You can take a look > >> here about http://wiki.apache.org/solr/SolrRelevancyFAQ. Also, you can > >> see > >> using debugQuery=true the score calculation for each document returned. > >> > >> Let me know you need something else. > >> > >> > >> > >> On Wed, Jan 30, 2013 at 1:13 PM, Sandeep Mestry <sanmes...@gmail.com> > >> wrote: > >> > >> > Hi All, > >> > > >> > I'm facing an issue in relevancy calculation by dismax query parser. > >> > The boost factor applied does not work as expected in certain cases > when > >> > the keyword is generic and by generic I mean, if the keyword is > >> appearing > >> > many times in the document as well as in the index. > >> > > >> > I have parser configuration as below: > >> > > >> > <requestHandler name="querydismax" class="solr.SearchHandler" > > >> > <lst name="defaults"> > >> > <str name="defType">edismax</str> > >> > <str name="echoParams">explicit</str> > >> > <float name="tie">0.01</float> > >> > <str name="qf">series_title^500 title^100 description^15 > >> > contribution</str> > >> > <str name="pf">series_title^200</str> > >> > <int name="ps">0</int> > >> > <str name="q.alt">*:*</str> > >> > </lst> > >> > </requestHandler> > >> > > >> > As you can see above, I'd expect the documents containing the matches > >> for > >> > series title should rank higher than the ones in contribution. > >> > > >> > This works well, if I type in a query like 'wonderworld' which is a > less > >> > occurring term and the series titles rank higher. But, if I type in a > >> > keyword like 'news' which is the most common term in the index, I get > >> hits > >> > in contributions even though I have lots of documents having word news > >> in > >> > series title. > >> > > >> > The field definition is as below: > >> > > >> > <field name="series_title" type="text_wc" indexed="true" stored="true" > >> > multiValued="false" /> > >> > <field name="title" type="text_wc" indexed="true" stored="true" > >> > multiValued="false" /> > >> > <field name="description" type="text_wc" indexed="true" stored="true" > >> > multiValued="false" /> > >> > <field name="contribution" type="text" indexed="true" stored="true" > >> > multiValued="true" /> > >> > > >> > <fieldType name="text" class="solr.TextField" > positionIncrementGap="100" > >> > compressThreshold="10"> > >> > <analyzer type="index"> > >> > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > >> > <filter class="solr.WordDelimiterFilterFactory" > >> > generateWordParts="1" generateNumberParts="1" catenateWords="1" > >> > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> > >> > <filter class="solr.LowerCaseFilterFactory"/> > >> > </analyzer> > >> > <analyzer type="query"> > >> > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > >> > <filter class="solr.WordDelimiterFilterFactory" > >> > generateWordParts="1" generateNumberParts="1" catenateWords="0" > >> > catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/> > >> > <filter class="solr.LowerCaseFilterFactory"/> > >> > </analyzer> > >> > </fieldType> > >> > > >> > <fieldType name="text_wc" class="solr.TextField" > >> positionIncrementGap="100" > >> > > > >> > <analyzer type="index"> > >> > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > >> > <filter class="solr.WordDelimiterFilterFactory" > >> > stemEnglishPossessive="0" generateWordParts="1" > generateNumberParts="1" > >> > catenateWords="1" catenateNumbers="1" catenateAll="1" > >> splitOnCaseChange="1" > >> > splitOnNumerics="0" preserveOriginal="1" /> > >> > <filter class="solr.LowerCaseFilterFactory"/> > >> > </analyzer> > >> > <analyzer type="query"> > >> > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > >> > <filter class="solr.WordDelimiterFilterFactory" > >> > stemEnglishPossessive="0" generateWordParts="1" > generateNumberParts="1" > >> > catenateWords="1" catenateNumbers="1" catenateAll="1" > >> splitOnCaseChange="1" > >> > splitOnNumerics="0" preserveOriginal="1" /> > >> > <filter class="solr.LowerCaseFilterFactory"/> > >> > </analyzer> > >> > </fieldType> > >> > > >> > I have tried debugging and when I use query term news, I see that > >> matches > >> > for contributions are ranked higher than series title. The parsed > >> queries > >> > look like below: > >> > (Note that I have edited the query as in reality I have lot of fields > >> that > >> > are searchable and I have only mentioned the fields containing text > >> data - > >> > rest all contain uuids) > >> > > >> > <str name="parsedquery"> > >> > (+DisjunctionMaxQuery((description:news^15.0 | title:news^100.0 | > >> > contributions:news | series_title:news^500.0)~0.01) () () () () () () > >> () () > >> > () () () () () () () () () () () () () () () () () () () ())/no_coord > >> > </str> > >> > <str name="parsedquery_toString"> > >> > +(description:news^15 | title:news^100.0 | contributions:news | > >> > series_title:news^500.0)~0.01 () () () () () () () () () () () () () > () > >> () > >> > () () () () () () () () () () () () () > >> > > >> > > >> > Could you guide me in right direction please? > >> > > >> > Many Thanks, > >> > Sandeep > >> > > >> > >> > >> > >> -- > >> Felipe Lahti > >> Consultant Developer - ThoughtWorks Porto Alegre > >> > > > > > -- Felipe Lahti Consultant Developer - ThoughtWorks Porto Alegre