Re: Boost Strangeness

Erick Erickson Wed, 15 Jun 2011 05:17:12 -0700

First off, you didn't "violate groups ettiquette". In fact, yours was
one of the better first posts in terms or providing enough information
for us to actually help!

A very useful page is the admin/analysis page to see how the
analysis chain works. For instance, if you haven't changed the
field type (i.e. <fieldType name="text">) that your input is
being broken up by WordDelimiterFilterFactory. Be sure to check
the "verbose" checkbox and enter text in both the query and
index boxes!

Here's an invaluable page, though do note that it's not exhaustive:

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters

But on to your problem:

First, boosting isn't absolute, boosting terms just tends to
bubble things up, you have to experiment with various weights....

To get the full comparison for both documents you're curious about,
try using "explainOther". see:
http://wiki.apache.org/solr/SolrRelevancyFAQ#Why_doesn.27t_document_id:juggernaut_appear_in_the_top_10_results_for_my_query

If you use that against the two docs in question, you should
see (although it's a hard read!) the reason the docs got
their relative scores.

Finally, your next e-mail hints at what's happening. If you're
putting multiple tokens in some of these fields, the length
normalization may be causing the matches to score lower. You can
try disabling those calculations (omitNorms="true" in your field definition).
See:
http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Scaling-Lucene-and-Solr

String types accept spaces just fine, but you might want to define
the fields with 'multiValued="true" ' and index each as a separate
field (note that won't work with a field that's also your <uniqueKey>).

Best
Erick

On Wed, Jun 15, 2011 at 7:16 AM, Judioo <cont...@judioo.com> wrote:
>   <dynamicField name="*_id"  type="text"    indexed="true"  stored="true"/>
>
> so all attributes except 'id' are of type text.
>
> I didn't know that about the string type. So is my problem as described (
> that partial matches are contributing to the calculation ) and does defining
> the filed type as string solve this problem.
>
> Or is my understanding completely incorrect?
>
> Thanks in advance
>
> On 15 June 2011 12:08, Ahmet Arslan <iori...@yahoo.com> wrote:
>
>> >
>> /solr/select/?q=b007vty6&defType=dismax&qf=id^10%20parent_id^9%20brand_container_id^8%20series_container_id^8%20subseries_container_id^8%20clip_container_id^1%20clip_episode_id^1&debugQuery=on&fl=id,parent_id,brand_container_id,series_container_id,subseries_container_id,clip_episode_id,clip_episode_id,score&wt=json&indent=on
>> >
>> >
>> > same result ( just higher scores ). It's almost as if
>> > partial matches on
>> > brand|series_container_id and id are being considered in
>> > the 1st document.
>> > Surely this can't be right / expected?
>>
>> What is your fieldType definition? Don't you think it is better to use
>> string type which is not tokenized?
>>
>

Re: Boost Strangeness

Reply via email to