Re: Trouble boosting a field

2017-01-14 Thread Alan Woodward
http://splainer.io/  from the gents at 
OpenSourceConnections is pretty good for this sort of thing, I find…

Alan Woodward
www.flax.co.uk


> On 13 Jan 2017, at 16:35, Tom Chiverton  wrote:
> 
> Well, I've tried much larger values than 8, and it still doesn't seem to do 
> the job ?
> 
> For now, assume my users are searching for exact sub strings of a real title.
> 
> Tom
> 
> 
> On 13/01/17 16:22, Walter Underwood wrote:
>> I use a boost of 8 for title with no boost on the content. Both Infoseek and 
>> Inktomi settled on the 8X boost, getting there with completely different 
>> methodologies.
>> 
>> You might not want the title to completely trump the content. That causes 
>> some odd anomalies. If someone searches for “ice age 2”, do you really want 
>> every title with “2” to come before “ice age two”? Or a search for “steve 
>> jobs” to return every article with “job” or “jobs” in the title first?
>> 
>> Also, use “edismax”, not “dismax”. Dismax was obsolete in Solr 3.x, five 
>> years ago.
>> 
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
>> 
>>> On Jan 13, 2017, at 7:10 AM, Tom Chiverton  wrote:
>>> 
>>> I have a few hundred documents with title and content fields.
>>> 
>>> I want a match in title to trump matches in content. If I search for 
>>> "connected vehicle" then a news article that has that in the content 
>>> shouldn't be ranked higher than the page with that in the title is 
>>> essentially what I want.
>>> 
>>> I have tried dismax with qf=title^2 as well as several other variants with 
>>> the standard query parser (like q="title:"foo"^2 OR content:"foo") but 
>>> documents without the search term in the title still come out before those 
>>> with the term in the title when ordered by score.
>>> 
>>> Is there something I am missing ?
>>> 
>>> From the docs, something like q=title:"connected vehicle"^2 OR 
>>> content:"connected vehicle" should have worked ? Even using ^100 didn't 
>>> help.
>>> 
>>> I tried with the dismax parser using
>>> 
>>>   "q": "Connected Vehicle",
>>>   "defType": "dismax",
>>>   "indent": "true",
>>>   "qf": "title^2000 content",
>>>   "pf": "pf=title^4000 content^2",
>>>   "sort": "score desc",
>>>   "wt": "json",
>>> 
>>> but that was not better. if I remove content from pf/qf then documents seem 
>>> to rank correctly.
>>> Example query and results (content omitted) : http://pastebin.com/5EhrRJP8 
>>>  with managed-schema 
>>> http://pastebin.com/mdraWQWE 
>>> 
>>> -- 
>>> 
>>> 
>>> 
>>> Tom Chiverton
>>> Lead Developer
>>> 
>>> e:   t...@extravision.com 
>>> 
>>> p:  0161 817 2922
>>> t:  @extravision 
>>> w:   www.extravision.com 
>>> 
>>> 
>>>  
>>> 
>>> Registered in the UK at: 107 Timber Wharf, 33 Worsley Street, Manchester, 
>>> M15 4LD.
>>> Company Reg No: 0‌‌5017214 VAT: GB 8‌‌24 5386 19
>>> 
>>> This e-mail is intended solely for the person to whom it is addressed and 
>>> may contain confidential or privileged information.
>>> Any views or opinions presented in this e-mail are solely of the author and 
>>> do not necessarily represent those of Extravision Ltd.
>>> 
>> 
>> __
>> This email has been scanned by the Symantec Email Security.cloud service.
>> For more information please visit http://www.symanteccloud.com
>> __
> 



Re: AW: AW: FacetField-Result on String-Field contains value with count 0?

2017-01-14 Thread Shawn Heisey
On 1/13/2017 7:36 AM, Sebastian Riemer wrote:
> Thanks, that's actually where I come from. But I don't want to exclude values 
> leading to a count of zero.
>
> Background to this: A user searched for mediaType "book" which gave him 10 
> results. Now some other task/routine whatever changes all those 10 books to 
> be say 10 ebooks, because the type has been incorrect. The user makes a 
> refresh, still looking for "book" gets 0 results (which is expected) and 
> because we rule out facet.fields having count 0, I don't get back the 
> selected mediaType "book" and thus I cannot select this value in the 
> select-dropdown-filter for the mediaType. This leads to confusion for the 
> user, since he has no results, but doesn't see that it's because of he still 
> has that mediaType-filter set to a value "books" which now actually leads to 
> 0 results.

Some users are always going to be confused in one way or another when
something behaves in a way that's contrary to their expectations.  If
you plan your interface correctly, you can eliminate the biggest sources
of confusion ... but there's an applicable saying here:  You can never
make things idiot-proof.  There's always a better idiot.

The facet.mincount parameter is the way to deal with this problem, as
Bill Bell already mentioned.  One of the reasons that facet.mincount
exists is to remove terms that have no documents, but still exist in the
index.

If the q parameter was an actual query instead of "all docs" and the
request didn't have facet.mincount, then the facet for that field would
still have thirteen entries, many of which might be zero.

Thanks,
Shawn