Re: "index-time" over boosted

remi tassing Tue, 24 Jan 2012 04:23:22 -0800

Any idea?

This is a snippet of my schema.xml now:


<?xml version="1.0" encoding="UTF-8" ?>
<!--
 Licensed to the Apache Software Foundation (ASF) under one or more
...
   <!-- fields for index-basic plugin -->
        <field name="host" type="url" stored="false" indexed="true"/>
        <field name="site" type="string" stored="false" indexed="true"/>
        <field name="url" type="url" stored="true" indexed="true"
            required="true"/>
        <field name="content" type="text" stored="true" indexed="true"
omitNorms="true"/>
        <field name="cache" type="string" stored="true" indexed="false"/>
        <field name="tstamp" type="long" stored="true" indexed="false"/>
   <!-- fields for index-anchor plugin -->
        <field name="anchor" type="string" stored="true" indexed="true"
            multiValued="true"/>

...
   <!-- uncomment the following to ignore any fields that don't already
match an existing
        field name or dynamic field, rather than reporting them as an
error.
        alternately, change the type="ignored" to some other type e.g.
"text" if you want
        unknown fields indexed and/or stored by default -->
   <!--dynamicField name="*" type="ignored" multiValued="true" /-->

 </fields>

 <!-- Field to use to determine and enforce document uniqueness.
      Unless this field is marked with required="false", it will be a
required field
   -->
 <uniqueKey>id</uniqueKey>

 <!-- field for the QueryParser to use when an explicit fieldname is absent
...

</schema>


Remi

On Sun, Jan 22, 2012 at 6:31 PM, remi tassing <tassingr...@gmail.com> wrote:

> Hi,
>
> I got wrong in beginning but putting omitNorms in the query url.
>
> Now following your advice, I merged the schema.xml from Nutch and Solr and
> made sure omitNorms was set to "true" for the content, just as you said.
>
> Unfortunately the problem remains :-(
>
>
> On Thursday, January 19, 2012, Jan Høydahl <jan....@cominvent.com> wrote:
> > Hi,
> >
> > The schema you pasted in your mail is NOT Solr3.5's default example
> schema. Did you get it from the Nutch project?
> >
> > And the "omitNorms" parameter is supposed to go in the <field> tag in
> schema.xml, and the "content" field in the example schema does not have
> omitNorms="true". Try to change
> >
> >       <field name="content" type="text" stored="false" indexed="true"/>
> > to
> >       <field name="content" type="text" stored="false" indexed="true"
> omitNorms="true"/>
> >
> > and try again. Please note that you SHOULD customize your schema, there
> is really no "default" schema in Solr (or Nutch), it's only an example or
> starting point. For your search application to work well you will have to
> invest some time in designing a schema, working with your queries, perhaps
> exploring DisMax query parser etc etc.
> >
> > --
> > Jan Høydahl, search solution architect
> > Cominvent AS - www.cominvent.com
> > Solr Training - www.solrtraining.com
> >
> > On 19. jan. 2012, at 13:01, remi tassing wrote:
> >
> >> Hello Jan,
> >>
> >> My schema wasn't changed from the release 3.5.0. The content can be seen
> >> below:
> >>
> >> <schema name="nutch" version="1.1">
> >>    <types>
> >>        <fieldType name="string" class="solr.StrField"
> >>            sortMissingLast="true" omitNorms="true"/>
> >>        <fieldType name="long" class="solr.LongField"
> >>            omitNorms="true"/>
> >>        <fieldType name="float" class="solr.FloatField"
> >>            omitNorms="true"/>
> >>        <fieldType name="text" class="solr.TextField"
> >>            positionIncrementGap="100">
> >>            <analyzer>
> >>                <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> >>                <filter class="solr.StopFilterFactory"
> >>                    ignoreCase="true" words="stopwords.txt"/>
> >>                <filter class="solr.WordDelimiterFilterFactory"
> >>                    generateWordParts="1" generateNumberParts="1"
> >>                    catenateWords="1" catenateNumbers="1" catenateAll="0"
> >>                    splitOnCaseChange="1"/>
> >>                <filter class="solr.LowerCaseFilterFactory"/>
> >>                <filter class="solr.EnglishPorterFilterFactory"
> >>                    protected="protwords.txt"/>
> >>                <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
> >>            </analyzer>
> >>        </fieldType>
> >>        <fieldType name="url" class="solr.TextField"
> >>            positionIncrementGap="100">
> >>            <analyzer>
> >>                <tokenizer class="solr.StandardTokenizerFactory"/>
> >>                <filter class="solr.LowerCaseFilterFactory"/>
> >>                <filter class="solr.WordDelimiterFilterFactory"
> >>                    generateWordParts="1" generateNumberParts="1"/>
> >>                <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
> >>            </analyzer>
> >>        </fieldType>
> >>    </types>
> >>    <fields>
> >>        <field name="id" type="string" stored="true" indexed="true"/>
> >>
> >>        <!-- core fields -->
> >>        <field name="segment" type="string" stored="true"
> indexed="false"/>
> >>        <field name="digest" type="string" stored="true"
> indexed="false"/>
> >>        <field name="boost" type="float" stored="true" indexed="false"/>
> >>
> >>        <!-- fields for index-basic plugin -->
> >>        <field name="host" type="url" stored="false" indexed="true"/>
> >>        <field name="site" type="string" stored="false" indexed="true"/>
> >>        <f
>

Re: "index-time" over boosted

Reply via email to