Why do documents without the search query term rank highest

2015-12-01 Thread Scotten Stuart
Hi All,

I hope this is the way to ask a question - please guide me if there is a 
different protocol

I have a question about results ranking for Solr V4.2 in combination with the 
CMS tool Adobe CQ (V5.6).

Despite trying different ways to configure the ranking of documents I am 
confused why content that does not have even one mention of the search query 
ranks higher than documents that are actually titled with the search query.

For example, searching for 'big' bring back 'Home' as the top result and 'Big 
Mac as the second result - see here 
http://www-a4.staging.mcdonalds.com/us/en/search/search_results.html?search=simple&queryText=burger&collection=usmcd


Any thoughts would be very welcome

Thanks
Stuart
PMP, Business Technical Analyst | CRS Consultant | Corporate IT Digital | 
McDonald's Corporation
2111 McDonald's Drive | Oak Brook, IL 60523 USA
Office: +1 630.623.5950 | Cell: 301.633.3298 | stuart.scot...@us.mcd.com





The information contained in this e-mail and any accompanying documents is 
confidential, may be privileged, and is intended solely for the person and/or 
entity to whom it is addressed (i.e. those identified in the "To" and "cc" 
box). They are the property of McDonald's Corporation. Unauthorized review, 
use, disclosure, or copying of this communication, or any part thereof, is 
strictly prohibited and may be unlawful. If you have received this e-mail in 
error, please return the e-mail and attachments to the sender and delete the 
e-mail and attachments and any copy from your system. McDonald's thanks you for 
your cooperation.


RE: Why do documents without the search query term rank highest

2015-12-01 Thread Scotten Stuart
Thank you!


Thanks
Stuart
PMP, Business Technical Analyst | CRS Consultant | Corporate IT Digital | 
McDonald's Corporation
2111 McDonald's Drive | Oak Brook, IL 60523 USA
Office: +1 630.623.5950 | Cell: 301.633.3298 | stuart.scot...@us.mcd.com




-Original Message-
From: Upayavira [mailto:u...@odoko.co.uk]
Sent: Tuesday, December 01, 2015 10:47 AM
To: solr-user@lucene.apache.org
Subject: Re: Why do documents without the search query term rank highest

I would suggest you ask on a forum related to Adobe CQ. There are many ways in 
which CQ could be issuing queries against Solr, and without insight into that, 
people here aren't that likely to be able to help you
- unless they happen to also use CQ, which probably amounts to a very small 
portion of this community.

Upayavira

On Tue, Dec 1, 2015, at 04:36 PM, Scotten Stuart wrote:
> Hi All,
>
> I hope this is the way to ask a question - please guide me if there is
> a different protocol
>
> I have a question about results ranking for Solr V4.2 in combination
> with the CMS tool Adobe CQ (V5.6).
>
> Despite trying different ways to configure the ranking of documents I
> am confused why content that does not have even one mention of the
> search query ranks higher than documents that are actually titled with
> the search query.
>
> For example, searching for 'big' bring back 'Home' as the top result
> and 'Big Mac as the second result - see here
> http://www-a4.staging.mcdonalds.com/us/en/search/search_results.html?s
> earch=simple&queryText=burger&collection=usmcd
>
>
> Any thoughts would be very welcome
>
> Thanks
> Stuart
> PMP, Business Technical Analyst | CRS Consultant | Corporate IT
> Digital | McDonald's Corporation
> 2111 McDonald's Drive | Oak Brook, IL 60523 USA
> Office: +1 630.623.5950 | Cell: 301.633.3298 |
> stuart.scot...@us.mcd.com
>
>
>
> 
>
> The information contained in this e-mail and any accompanying
> documents is confidential, may be privileged, and is intended solely
> for the person and/or entity to whom it is addressed (i.e. those identified 
> in the "To"
> and "cc" box). They are the property of McDonald's Corporation.
> Unauthorized review, use, disclosure, or copying of this
> communication, or any part thereof, is strictly prohibited and may be
> unlawful. If you have received this e-mail in error, please return the
> e-mail and attachments to the sender and delete the e-mail and
> attachments and any copy from your system. McDonald's thanks you for your 
> cooperation.



The information contained in this e-mail and any accompanying documents is 
confidential, may be privileged, and is intended solely for the person and/or 
entity to whom it is addressed (i.e. those identified in the "To" and "cc" 
box). They are the property of McDonald's Corporation. Unauthorized review, 
use, disclosure, or copying of this communication, or any part thereof, is 
strictly prohibited and may be unlawful. If you have received this e-mail in 
error, please return the e-mail and attachments to the sender and delete the 
e-mail and attachments and any copy from your system. McDonald's thanks you for 
your cooperation.


RE: Why do documents without the search query term rank highest

2015-12-01 Thread Scotten Stuart
Thank you for the feedback - I will need some time to put together the response 
to your suggestions - and you're right, I did get the search URL wrong - just a 
beginner at this!


Thanks
Stuart
PMP, Business Technical Analyst | CRS Consultant | Corporate IT Digital | 
McDonald's Corporation
2111 McDonald's Drive | Oak Brook, IL 60523 USA
Office: +1 630.623.5950 | Cell: 301.633.3298 | stuart.scot...@us.mcd.com




-Original Message-
From: Chris Hostetter [mailto:hossman_luc...@fucit.org]
Sent: Tuesday, December 01, 2015 10:54 AM
To: solr-user@lucene.apache.org
Subject: Re: Why do documents without the search query term rank highest


: I would suggest you ask on a forum related to Adobe CQ. There are many
: ways in which CQ could be issuing queries against Solr, and without
: insight into that, people here aren't that likely to be able to help you
: - unless they happen to also use CQ, which probably amounts to a very
: small portion of this community.

If you have access to the Solr logs, and can provide us with the configs, 
schema, and requests being made by your frontend, then folks might be able to 
help explain the results.

In particular, if you can figure out what query the front end is making, then 
make that same query with "debug=true" added to the request, and provide that 
entire output here, then that will help explain everything about the queries 
being executed and wy the results are getting the scores hey have...

https://wiki.apache.org/solr/UsingMailingLists


: > For example, searching for 'big' bring back 'Home' as the top result
: > and 'Big Mac as the second result - see here
: > 
http://www-a4.staging.mcdonalds.com/us/en/search/search_results.html?search=simple&queryText=burger&collection=usmcd

FWIW: That URL doesn't do a search for "big" ... pretty sure you ment...

http://www-a4.staging.mcdonalds.com/us/en/search/search_results.html?search=simple&queryText=big&collection=usmcd



-Hoss
http://www.lucidworks.com/



The information contained in this e-mail and any accompanying documents is 
confidential, may be privileged, and is intended solely for the person and/or 
entity to whom it is addressed (i.e. those identified in the "To" and "cc" 
box). They are the property of McDonald's Corporation. Unauthorized review, 
use, disclosure, or copying of this communication, or any part thereof, is 
strictly prohibited and may be unlawful. If you have received this e-mail in 
error, please return the e-mail and attachments to the sender and delete the 
e-mail and attachments and any copy from your system. McDonald's thanks you for 
your cooperation.


RE: Why do documents without the search query term rank highest

2015-12-01 Thread Scotten Stuart
Hi

I think this is what you requested - the query 'big' as entered into the Solr 
dashboard with Debug applied - I have also attached the Schema and Config files

Again, my confusion is why the document 'Home' appears ahead of the document 
'Big Mac' in the ranking when the query term 'big' only appears once in 'Home' 
but several times in 'Big Mac'?

Thank you for your thoughts
Stuart





0
4

true
true

description: big OR keywords:big OR url:big OR title:big OR anchor:big OR 
site:big OR type:big

xml
5





Home :: McDonalds.com Menu Burgers & Sandwiches Chicken & Fish Breakfast Salads 
Snacks & Sides Beverages McCafé Desserts & Shakes Full Menu Meal Bundles 
Favorites Under 400 Dollar Menu & More Extra Value Meals Happy Meals Mighty 
Kids Meals Food Quality Nutrition Choices See What We're Made Of Trends & 
Innovation Our Food. Your Questions. View all promotions Our History Our People 
Leadership Our Communities Values In Action Corporate Info News New Search II 
Replacement to New Search Our People Working Here Training & Education Benefits 
Careers for Veterans Gift Cards Free Wi-Fi PlayPlace & Parties Subscriptions 
Home Share My Meal Builder   items Try it now Our food. Your questions. Lovin' 
this way Join our Email list X You're In! Thanks for signing up! Complete Your 
Profile X This is your inside scoop to events, promotions, how we make our food 
and what we're doing to give back. Email required Zipcode required By clicking 
"Subscribe" you agree to receive emails, promotions, and general lovin' 
messages from McDonald's. Subscribe On the list already? Update your profile 
Follow  Corporate * Privacy * Terms & Conditions * Subscriptions * 
©2010-2015  McDonald's. All Rights Reserved


united states, angus, third pounders, happy meal, big mac, mcmuffin, ronald, 
food, franchise, hamburger, cheeseburger, quarter pounder, fry, fries, RMHC, 
ronald mcdonald house charities, stock, mcdonald's corporation, view menu, 
menu, see menu, full menu, meal bundles, Dollar Menu, Extra Value Meals, Happy 
Meals, Mighty Kids Meals, burgers, sandwiches, drinks, beverages, breakfast, 
salads, beverages, McCafe, desserts and shakes, desserts, shakes, food quality, 
food, quality, nutrition, See What We're Made Of, trends, innovation, jobs, 
careers

Home :: McDonalds.com
Home :: McDonalds.com
20151126024653
1.0198051
f6398a900a5d0adbf537b99ef9c6ffcc
2015-11-26T10:47:07.412Z

McDonald's in the USA: Food and nutrition info, franchise opportunities, job 
and career info, restaurant locations, promotional information, history, 
innovation and more.


http://www-a4.staging.mcdonalds.com/us/en/home.html


http://www-a4.staging.mcdonalds.com/us/en/home.html


Home

1518901056269975552



Big Mac :: McDonalds.com Menu Burgers & Sandwiches Chicken & Fish Breakfast 
Salads Snacks & Sides Beverages McCafé Desserts & Shakes Full Menu Meal Bundles 
Favorites Under 400 Dollar Menu & More Extra Value Meals Happy Meals Mighty 
Kids Meals Food Quality Nutrition Choices See What We're Made Of Trends & 
Innovation Our Food. Your Questions. View all promotions Our History Our People 
Leadership Our Communities Values In Action Corporate Info News New Search II 
Replacement to New Search Our People Working Here Training & Education Benefits 
Careers for Veterans Gift Cards Free Wi-Fi PlayPlace & Parties Subscriptions 
Our Story   /   Replacement to New Search   /   Big Mac Share My Meal Builder   
items Join our Email list X You're In! Thanks for signing up! Complete Your 
Profile X This is your inside scoop to events, promotions, how we make our food 
and what we're doing to give back. Email required Zipcode required By clicking 
"Subscribe" you agree to receive emails, promotions, and general lovin' 
messages from McDonald's. Subscribe On the list already? Update your profile 
Join our team Meet our toughest critics See how stars are made Follow  
Corporate * Privacy * Terms & Conditions * Subscriptions * ©2010-2015  
McDonald's. All Rights Reserved


Big Mac :: McDonalds.com
Big Mac :: McDonalds.com
20151126025325
0.017412648
32880656d144194cb1deadde3fbd08a8
2015-11-26T10:58:08.214Z
Big Mac, BigMac, bigmac, biggmack, bigMac, Bigmac

http://www-a4.staging.mcdonalds.com/us/en/our_story/replacement-to-new-search/BigMac.html


http://www-a4.staging.mcdonalds.com/us/en/our_story/replacement-to-new-search/BigMac.html


Big Mac

1518901056324501504



What is in the Big Mac sauce? :: McDonalds.com Menu Burgers & Sandwiches 
Chicken & Fish Breakfast Salads Snacks & Sides Beverages McCafé Desserts & 
Shakes Full Menu Meal Bundles Favorites Under 400 Dollar Menu & More Extra 
Value Meals Happy Meals Mighty Kids Meals Food Quality Nutrition Choices See 
What We're Made Of Trends & Innovation Our Food. Your Questions. View all 
promotions Our History Our People Leadership Our Communities Values In Action 
Corporate Info News New Search II Replacement to New Search Our People Working 
Here Training & Education Benefits Careers for

RE: Why do documents without the search query term rank highest

2015-12-01 Thread Scotten Stuart
WOW!

Thanks Chris - I have read your feedback but I will need to go through it a 
couple more times to get my head around it :) - thanks for taking the time to 
help - much appreciated!



Thanks
Stuart
PMP, Business Technical Analyst | CRS Consultant | Corporate IT Digital | 
McDonald's Corporation
2111 McDonald's Drive | Oak Brook, IL 60523 USA
Office: +1 630.623.5950 | Cell: 301.633.3298 | stuart.scot...@us.mcd.com




-Original Message-
From: Chris Hostetter [mailto:hossman_luc...@fucit.org]
Sent: Tuesday, December 01, 2015 3:52 PM
To: solr-user@lucene.apache.org
Subject: RE: Why do documents without the search query term rank highest


: Again, my confusion is why the document 'Home' appears ahead of the
: document 'Big Mac' in the ranking when the query term 'big' only appears
: once in 'Home' but several times in 'Big Mac'?

The key to understanding how documents are scored is in the query structure and 
the "explain" output.

By default the explain output is a simple string using newlines & whitespace 
indenting for formatting -- something that got lost when you pasted it into 
email -- but i've tried to reformat it below based on educated guesses and lots 
of experience. (FWIW: adding debug.explain.structured=true will use the 
xml/json/whatever response format for structure instead of newlines + indenting)

http://www-a4.staging.mcdonalds.com/us/en/home.html";>

0.027089478 = (MATCH) product of:
.0.18962634 = (MATCH) sum of:
..0.18962634 = (MATCH) weight(keywords:big in 78) [DefaultSimilarity]
...0.18962634 = score(doc=78,freq=1.0 = termFreq=1.0 ), product of:
0.3345638 = queryWeight, product of:
.5.18205 = idf(docFreq=3, maxDocs=262)
.0.06456205 = queryNorm
0.56678677 = fieldWeight in 78, product of:
.1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0
.5.18205 = idf(docFreq=3, maxDocs=262)
.0.109375 = fieldNorm(doc=78)
.0.14285715 = coord(1/7)

So what the above tells us, is that the top scoring document (home.html) 
matched a single clause of the query which was "keywords:big".  The *term* 
"keywords:big" appeared 1 time (freq=1.0) in this document, and is in a total 
of 3 documents (docFreq).

(note that *term* is key here -- the number of times the *word* big appears in 
all fields doesn't matter for score calculations, just that it appears in the 
"keywords" field for a total of 3 documents, and this is one of them)

There were "penalties" to the score for this document based on the "fieldNorm" 
of the keywords field (which comes from index time document & field boosts, as 
well as field length at index time) and because it only matched 1/7 of the 
clauses of the query.

Now lets compare with the second match

http://www-a4.staging.mcdonalds.com/us/en/our_story/replacement-to-new-search/BigMac.html";>

0.0075755017 = (MATCH) product of:
.0.026514255 = (MATCH) sum of:
..0.0146626085 = (MATCH) weight(description:big in 104) [DefaultSimilarity]
...0.0146626085 = score(doc=104,freq=3.0 = termFreq=3.0 ), product of:
0.3345638 = queryWeight, product of:
.5.18205 = idf(docFreq=3, maxDocs=262)
.0.06456205 = queryNorm
0.043826047 = fieldWeight in 104, product of:
.1.7320508 = tf(freq=3.0), with freq of: 3.0 = termFreq=3.0
.5.18205 = idf(docFreq=3, maxDocs=262)
.0.0048828125 = fieldNorm(doc=104)
..0.011851646 = (MATCH) weight(title:big in 104) [DefaultSimilarity]
...0.011851646 = score(doc=104,freq=1.0 = termFreq=1.0 ), product of:
0.3345638 = queryWeight, product of:
.5.18205 = idf(docFreq=3, maxDocs=262)
.0.06456205 = queryNorm
0.035424173 = fieldWeight in 104, product of:
.1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0
.5.18205 = idf(docFreq=3, maxDocs=262)
.0.0068359375 = fieldNorm(doc=104)
.0.2857143 = coord(2/7)

In this case, the document matches two clauses of the query -- 
"description:big" and "title:big".  The term description:big is matched 3 times 
(termFreq) in this document, and evidently exists in only 3 documents in the 
index (docFreq) but the fieldNorm is penalizing the overall scores.  Likewise 
the term title:big is matched 1 time, and exists in only 3 documents in your 
index -- the fieldNorm is slightly higher (probably due to the shorter length 
of the title).  The overall score of the second doc is penalized for only 
matching 2 of the 7 clauses.

Based on what i'm seeing here, the biggest suprise i have is the fieldNorm 
values you are getting -- they don't make sense given the lengths of the fields 
you showed us in the output unless some index time document (or
field) boosts are getting applied -- perhaps intended to "promote" the 
"home.html" page in your search results?  My guess is a some setting in your 
CMS is doing this?  maybe based on "page depth" or something like that?

Based on your configs, I'm guessing you're running Solr 4.2 -- So I tried 
loading up copies of those 2 documents using the config+schema you provided, 
and here are the score explanations i got