Re: "index-time" over boosted

remi tassing Wed, 25 Jan 2012 01:26:41 -0800

Hi,

it worked (I'm using Solr-3.4.0, not that it matters)!!


I'll try to figure out what went wrong ...with my limited skills.

The solution omitNorms="true" works for now but it's not a long term
solution in my opinion. I also need to figure out how to make all that work.

Thanks again Jan!!

Remi

On Tue, Jan 24, 2012 at 5:58 PM, Jan Høydahl <[email protected]> wrote:

> Hi,
>
> Well, I think you do it right, but get tricked by either editing the wrong
> file, a typo or browser caching.
> Why not try to start with a fresh Solr3.5.0, start the example app, index
> all exampledocs, search for "Podcasts", you get one hit, in fields "text"
> and "features".
> Then change solr/example/solr/conf/schema.xml and add omitNorms="true" to
> these two fields. Then stop Solr, delete your index, start Solr, re-index
> the docs and try again. fieldNorm is now 1.0. Once you get that working you
> can start debugging where you got it wrong in your own setup.
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> Solr Training - www.solrtraining.com
>
>  On 24. jan. 2012, at 14:55, remi tassing wrote:
>
> > Hello,
> >
> > thanks for helping out Jan, I really appreciate that!
> >
> > These are full explains of two results:
> >
> > Result#1.----------
> >
> > 3.0412199E-5 = (MATCH) max of:
> >  3.0412199E-5 = (MATCH) weight(content:"mobil broadband"^0.5 in
> > 19081), product of:
> >    0.13921623 = queryWeight(content:"mobil broadband"^0.5), product of:
> >      0.5 = boost
> >      6.3531075 = idf(content: mobil=5270 broadband=2392)
> >      0.043826185 = queryNorm
> >    2.1845297E-4 = fieldWeight(content:"mobil broadband" in 19081),
> product of:
> >      3.6055512 = tf(phraseFreq=13.0)
> >      6.3531075 = idf(content: mobil=5270 broadband=2392)
> >      9.536743E-6 = fieldNorm(field=content, doc=19081)
> >
> > Result#2.-------------
> >
> > 2.6991445E-5 = (MATCH) max of:
> >  2.6991445E-5 = (MATCH) weight(content:"mobil broadband"^0.5 in
> > 15306), product of:
> >    0.13921623 = queryWeight(content:"mobil broadband"^0.5), product of:
> >      0.5 = boost
> >      6.3531075 = idf(content: mobil=5270 broadband=2392)
> >      0.043826185 = queryNorm
> >    1.9388145E-4 = fieldWeight(content:"mobil broadband" in 15306),
> product of:
> >      1.0 = tf(phraseFreq=1.0)
> >      6.3531075 = idf(content: mobil=5270 broadband=2392)
> >      3.0517578E-5 = fieldNorm(field=content, doc=15306)
> >
> > Remi
> >
> >
> > On Tue, Jan 24, 2012 at 3:38 PM, Jan Høydahl <[email protected]>
> wrote:
> >
> >> That looks right. Can you restart your Solr, do a new search with
> >> &debugQuery=true and copy/paste the full EXPLAIN output for your query?
> >>
> >> --
> >> Jan Høydahl, search solution architect
> >> Cominvent AS - www.cominvent.com
> >> Solr Training - www.solrtraining.com
> >>
> >> On 24. jan. 2012, at 13:22, remi tassing wrote:
> >>
> >>> Any idea?
> >>>
> >>> This is a snippet of my schema.xml now:
> >>>
> >>> <?xml version="1.0" encoding="UTF-8" ?>
> >>> <!--
> >>> Licensed to the Apache Software Foundation (ASF) under one or more
> >>> ...
> >>>  <!-- fields for index-basic plugin -->
> >>>       <field name="host" type="url" stored="false" indexed="true"/>
> >>>       <field name="site" type="string" stored="false" indexed="true"/>
> >>>       <field name="url" type="url" stored="true" indexed="true"
> >>>           required="true"/>
> >>>       <field name="content" type="text" stored="true" indexed="true"
> >>> omitNorms="true"/>
> >>>       <field name="cache" type="string" stored="true" indexed="false"/>
> >>>       <field name="tstamp" type="long" stored="true" indexed="false"/>
> >>>  <!-- fields for index-anchor plugin -->
> >>>       <field name="anchor" type="string" stored="true" indexed="true"
> >>>           multiValued="true"/>
> >>>
> >>> ...
> >>>  <!-- uncomment the following to ignore any fields that don't already
> >>> match an existing
> >>>       field name or dynamic field, rather than reporting them as an
> >>> error.
> >>>       alternately, change the type="ignored" to some other type e.g.
> >>> "text" if you want
> >>>       unknown fields indexed and/or stored by default -->
> >>>  <!--dynamicField name="*" type="ignored" multiValued="true" /-->
> >>>
> >>> </fields>
> >>>
> >>> <!-- Field to use to determine and enforce document uniqueness.
> >>>     Unless this field is marked with required="false", it will be a
> >>> required field
> >>>  -->
> >>> <uniqueKey>id</uniqueKey>
> >>>
> >>> <!-- field for the QueryParser to use when an explicit fieldname is
> >> absent
> >>> ...
> >>>
> >>> </schema>
> >>>
> >>>
> >>> Remi
> >>>
> >>> On Sun, Jan 22, 2012 at 6:31 PM, remi tassing <[email protected]>
> >> wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>> I got wrong in beginning but putting omitNorms in the query url.
> >>>>
> >>>> Now following your advice, I merged the schema.xml from Nutch and Solr
> >> and
> >>>> made sure omitNorms was set to "true" for the content, just as you
> said.
> >>>>
> >>>> Unfortunately the problem remains :-(
> >>>>
> >>>>
> >>>> On Thursday, January 19, 2012, Jan Høydahl <[email protected]>
> >> wrote:
> >>>>> Hi,
> >>>>>
> >>>>> The schema you pasted in your mail is NOT Solr3.5's default example
> >>>> schema. Did you get it from the Nutch project?
> >>>>>
> >>>>> And the "omitNorms" parameter is supposed to go in the <field> tag in
> >>>> schema.xml, and the "content" field in the example schema does not
> have
> >>>> omitNorms="true". Try to change
> >>>>>
> >>>>>     <field name="content" type="text" stored="false" indexed="true"/>
> >>>>> to
> >>>>>     <field name="content" type="text" stored="false" indexed="true"
> >>>> omitNorms="true"/>
> >>>>>
> >>>>> and try again. Please note that you SHOULD customize your schema,
> there
> >>>> is really no "default" schema in Solr (or Nutch), it's only an example
> >> or
> >>>> starting point. For your search application to work well you will have
> >> to
> >>>> invest some time in designing a schema, working with your queries,
> >> perhaps
> >>>> exploring DisMax query parser etc etc.
> >>>>>
> >>>>> --
> >>>>> Jan Høydahl, search solution architect
> >>>>> Cominvent AS - www.cominvent.com
> >>>>> Solr Training - www.solrtraining.com
> >>>>>
> >>>>> On 19. jan. 2012, at 13:01, remi tassing wrote:
> >>>>>
> >>>>>> Hello Jan,
> >>>>>>
> >>>>>> My schema wasn't changed from the release 3.5.0. The content can be
> >> seen
> >>>>>> below:
> >>>>>>
> >>>>>> <schema name="nutch" version="1.1">
> >>>>>>  <types>
> >>>>>>      <fieldType name="string" class="solr.StrField"
> >>>>>>          sortMissingLast="true" omitNorms="true"/>
> >>>>>>      <fieldType name="long" class="solr.LongField"
> >>>>>>          omitNorms="true"/>
> >>>>>>      <fieldType name="float" class="solr.FloatField"
> >>>>>>          omitNorms="true"/>
> >>>>>>      <fieldType name="text" class="solr.TextField"
> >>>>>>          positionIncrementGap="100">
> >>>>>>          <analyzer>
> >>>>>>              <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> >>>>>>              <filter class="solr.StopFilterFactory"
> >>>>>>                  ignoreCase="true" words="stopwords.txt"/>
> >>>>>>              <filter class="solr.WordDelimiterFilterFactory"
> >>>>>>                  generateWordParts="1" generateNumberParts="1"
> >>>>>>                  catenateWords="1" catenateNumbers="1"
> >> catenateAll="0"
> >>>>>>                  splitOnCaseChange="1"/>
> >>>>>>              <filter class="solr.LowerCaseFilterFactory"/>
> >>>>>>              <filter class="solr.EnglishPorterFilterFactory"
> >>>>>>                  protected="protwords.txt"/>
> >>>>>>              <filter
> >> class="solr.RemoveDuplicatesTokenFilterFactory"/>
> >>>>>>          </analyzer>
> >>>>>>      </fieldType>
> >>>>>>      <fieldType name="url" class="solr.TextField"
> >>>>>>          positionIncrementGap="100">
> >>>>>>          <analyzer>
> >>>>>>              <tokenizer class="solr.StandardTokenizerFactory"/>
> >>>>>>              <filter class="solr.LowerCaseFilterFactory"/>
> >>>>>>              <filter class="solr.WordDelimiterFilterFactory"
> >>>>>>                  generateWordParts="1" generateNumberParts="1"/>
> >>>>>>              <filter
> >> class="solr.RemoveDuplicatesTokenFilterFactory"/>
> >>>>>>          </analyzer>
> >>>>>>      </fieldType>
> >>>>>>  </types>
> >>>>>>  <fields>
> >>>>>>      <field name="id" type="string" stored="true" indexed="true"/>
> >>>>>>
> >>>>>>      <!-- core fields -->
> >>>>>>      <field name="segment" type="string" stored="true"
> >>>> indexed="false"/>
> >>>>>>      <field name="digest" type="string" stored="true"
> >>>> indexed="false"/>
> >>>>>>      <field name="boost" type="float" stored="true"
> indexed="false"/>
> >>>>>>
> >>>>>>      <!-- fields for index-basic plugin -->
> >>>>>>      <field name="host" type="url" stored="false" indexed="true"/>
> >>>>>>      <field name="site" type="string" stored="false"
> indexed="true"/>
> >>>>>>      <f
> >>>>
> >>
> >>
>
>

Re: "index-time" over boosted

Reply via email to