Hello Jan,

My schema wasn't changed from the release 3.5.0. The content can be seen
below:

<schema name="nutch" version="1.1">
    <types>
        <fieldType name="string" class="solr.StrField"
            sortMissingLast="true" omitNorms="true"/>
        <fieldType name="long" class="solr.LongField"
            omitNorms="true"/>
        <fieldType name="float" class="solr.FloatField"
            omitNorms="true"/>
        <fieldType name="text" class="solr.TextField"
            positionIncrementGap="100">
            <analyzer>
                <tokenizer class="solr.WhitespaceTokenizerFactory"/>
                <filter class="solr.StopFilterFactory"
                    ignoreCase="true" words="stopwords.txt"/>
                <filter class="solr.WordDelimiterFilterFactory"
                    generateWordParts="1" generateNumberParts="1"
                    catenateWords="1" catenateNumbers="1" catenateAll="0"
                    splitOnCaseChange="1"/>
                <filter class="solr.LowerCaseFilterFactory"/>
                <filter class="solr.EnglishPorterFilterFactory"
                    protected="protwords.txt"/>
                <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
            </analyzer>
        </fieldType>
        <fieldType name="url" class="solr.TextField"
            positionIncrementGap="100">
            <analyzer>
                <tokenizer class="solr.StandardTokenizerFactory"/>
                <filter class="solr.LowerCaseFilterFactory"/>
                <filter class="solr.WordDelimiterFilterFactory"
                    generateWordParts="1" generateNumberParts="1"/>
                <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
            </analyzer>
        </fieldType>
    </types>
    <fields>
        <field name="id" type="string" stored="true" indexed="true"/>

        <!-- core fields -->
        <field name="segment" type="string" stored="true" indexed="false"/>
        <field name="digest" type="string" stored="true" indexed="false"/>
        <field name="boost" type="float" stored="true" indexed="false"/>

        <!-- fields for index-basic plugin -->
        <field name="host" type="url" stored="false" indexed="true"/>
        <field name="site" type="string" stored="false" indexed="true"/>
        <field name="url" type="url" stored="true" indexed="true"
            required="true"/>
        <field name="content" type="text" stored="false" indexed="true"/>
        <field name="title" type="text" stored="true" indexed="true"/>
        <field name="cache" type="string" stored="true" indexed="false"/>
        <field name="tstamp" type="long" stored="true" indexed="false"/>

        <!-- fields for index-anchor plugin -->
        <field name="anchor" type="string" stored="true" indexed="true"
            multiValued="true"/>

        <!-- fields for index-more plugin -->
        <field name="type" type="string" stored="true" indexed="true"
            multiValued="true"/>
        <field name="contentLength" type="long" stored="true"
            indexed="false"/>
        <field name="lastModified" type="long" stored="true"
            indexed="false"/>
        <field name="date" type="string" stored="true" indexed="true"/>

        <!-- fields for languageidentifier plugin -->
        <field name="lang" type="string" stored="true" indexed="true"/>

        <!-- fields for subcollection plugin -->
        <field name="subcollection" type="string" stored="true"
            indexed="true" multiValued="true"/>

        <!-- fields for feed plugin -->
        <field name="author" type="string" stored="true" indexed="true"/>
        <field name="tag" type="string" stored="true" indexed="true"/>
        <field name="feed" type="string" stored="true" indexed="true"/>
        <field name="publishedDate" type="string" stored="true"
            indexed="true"/>
        <field name="updatedDate" type="string" stored="true"
            indexed="true"/>
    </fields>
    <uniqueKey>id</uniqueKey>
    <defaultSearchField>content</defaultSearchField>
    <solrQueryParser defaultOperator="OR"/>
</schema>

Remi

On Thu, Jan 19, 2012 at 1:28 PM, Jan Høydahl <jan....@cominvent.com> wrote:

> Hi,
>
> Can you paste exactly both <fieldType> and <field> definitions from your
> schema? omitNorms="true" should kill norms.
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> Solr Training - www.solrtraining.com
>
> On 19. jan. 2012, at 08:18, remi tassing wrote:
>
> > Hi,
> >
> > just a background on my setup. I'm crawling with Nutch-1.2, I used
> Solr-1.4
> > and Solr-3.5, with the same result. Solr is still using the default
> > settings.
> >
> > I found this problem just by accident. I queried "mobile broadband", page
> > A, has 2 occurences and scores higher than page B that has 19
> occurences. I
> > found it weird and that's why I started investigating.
> >
> > The debug results are given below and you can see that queryWeight, idf
> > and queryNorm are the same, tf is higher, as expected, in B but what
> makes
> > the difference is clearly fieldNorm.
> >
> > A: 0.010779975 = (MATCH) weight(content:"mobil broadband" in 18730),
> > product of: 1.0 = queryWeight(content:"mobil broadband"), product of:
> > 6.2444286 = idf(content: mobil=4922 broadband=2290) 0.16014275 =
> queryNorm
> > 0.010779975 = fieldWeight(content:"mobil broadband" in 18730), product
> of:
> > 1.4142135 = tf(phraseFreq=2.0) 6.2444286 = idf(content: mobil=4922
> > broadband=2290) 0.0012207031 = fieldNorm(field=content, doc=18730)
> >
> > B: 8.5223187E-4 = (MATCH) weight(content:"mobil broadband" in 14391),
> > product of: 1.0 = queryWeight(content:"mobil broadband"), product of:
> > 6.2444286 = idf(content: mobil=4922 broadband=2290) 0.16014275 =
> queryNorm
> > 8.5223187E-4 = fieldWeight(content:"mobil broadband" in 14391), product
> of:
> > 4.472136 = tf(phraseFreq=20.0) 6.2444286 = idf(content: mobil=4922
> > broadband=2290) 3.0517578E-5 = fieldNorm(field=content, doc=14391)
> >
> > Remi
> >
> > On Wed, Jan 18, 2012 at 8:52 PM, Jan Høydahl <jan....@cominvent.com>
> wrote:
> >
> >>> I've come accros a problem where newly indexed pages almost always come
> >>> first even when the term frequency is relatively slow.
> >>
> >> There is no inherent index-time boost, so this must be something else.
> >> Can you give us an example of a query? Which query parser do you use?
> >>
> >>> I read the posts below on "fieldNorm" and "omitNorms" but setting
> >>> "omitNorms=true" doesn't change anything for me on the calculation of
> >>> fieldNorm.
> >>
> >> Are you sure you have spelled omitNorms="true" correctly, then restarted
> >> Solr (to refresh config)? The effect of Norms on your score will be that
> >> shorter fields score higher than long fields.
> >>
> >> Perhaps you instead can try to tell us your use-case. What kind of
> raning
> >> are you trying to achieve? Then we can help suggest how to get there.
> >>
> >> --
> >> Jan Høydahl, search solution architect
> >> Cominvent AS - www.cominvent.com
> >> Solr Training - www.solrtraining.com
>
>

Reply via email to