Hello Jan, My schema wasn't changed from the release 3.5.0. The content can be seen below:
<schema name="nutch" version="1.1"> <types> <fieldType name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true"/> <fieldType name="long" class="solr.LongField" omitNorms="true"/> <fieldType name="float" class="solr.FloatField" omitNorms="true"/> <fieldType name="text" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> </fieldType> <fieldType name="url" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> </fieldType> </types> <fields> <field name="id" type="string" stored="true" indexed="true"/> <!-- core fields --> <field name="segment" type="string" stored="true" indexed="false"/> <field name="digest" type="string" stored="true" indexed="false"/> <field name="boost" type="float" stored="true" indexed="false"/> <!-- fields for index-basic plugin --> <field name="host" type="url" stored="false" indexed="true"/> <field name="site" type="string" stored="false" indexed="true"/> <field name="url" type="url" stored="true" indexed="true" required="true"/> <field name="content" type="text" stored="false" indexed="true"/> <field name="title" type="text" stored="true" indexed="true"/> <field name="cache" type="string" stored="true" indexed="false"/> <field name="tstamp" type="long" stored="true" indexed="false"/> <!-- fields for index-anchor plugin --> <field name="anchor" type="string" stored="true" indexed="true" multiValued="true"/> <!-- fields for index-more plugin --> <field name="type" type="string" stored="true" indexed="true" multiValued="true"/> <field name="contentLength" type="long" stored="true" indexed="false"/> <field name="lastModified" type="long" stored="true" indexed="false"/> <field name="date" type="string" stored="true" indexed="true"/> <!-- fields for languageidentifier plugin --> <field name="lang" type="string" stored="true" indexed="true"/> <!-- fields for subcollection plugin --> <field name="subcollection" type="string" stored="true" indexed="true" multiValued="true"/> <!-- fields for feed plugin --> <field name="author" type="string" stored="true" indexed="true"/> <field name="tag" type="string" stored="true" indexed="true"/> <field name="feed" type="string" stored="true" indexed="true"/> <field name="publishedDate" type="string" stored="true" indexed="true"/> <field name="updatedDate" type="string" stored="true" indexed="true"/> </fields> <uniqueKey>id</uniqueKey> <defaultSearchField>content</defaultSearchField> <solrQueryParser defaultOperator="OR"/> </schema> Remi On Thu, Jan 19, 2012 at 1:28 PM, Jan Høydahl <jan....@cominvent.com> wrote: > Hi, > > Can you paste exactly both <fieldType> and <field> definitions from your > schema? omitNorms="true" should kill norms. > > -- > Jan Høydahl, search solution architect > Cominvent AS - www.cominvent.com > Solr Training - www.solrtraining.com > > On 19. jan. 2012, at 08:18, remi tassing wrote: > > > Hi, > > > > just a background on my setup. I'm crawling with Nutch-1.2, I used > Solr-1.4 > > and Solr-3.5, with the same result. Solr is still using the default > > settings. > > > > I found this problem just by accident. I queried "mobile broadband", page > > A, has 2 occurences and scores higher than page B that has 19 > occurences. I > > found it weird and that's why I started investigating. > > > > The debug results are given below and you can see that queryWeight, idf > > and queryNorm are the same, tf is higher, as expected, in B but what > makes > > the difference is clearly fieldNorm. > > > > A: 0.010779975 = (MATCH) weight(content:"mobil broadband" in 18730), > > product of: 1.0 = queryWeight(content:"mobil broadband"), product of: > > 6.2444286 = idf(content: mobil=4922 broadband=2290) 0.16014275 = > queryNorm > > 0.010779975 = fieldWeight(content:"mobil broadband" in 18730), product > of: > > 1.4142135 = tf(phraseFreq=2.0) 6.2444286 = idf(content: mobil=4922 > > broadband=2290) 0.0012207031 = fieldNorm(field=content, doc=18730) > > > > B: 8.5223187E-4 = (MATCH) weight(content:"mobil broadband" in 14391), > > product of: 1.0 = queryWeight(content:"mobil broadband"), product of: > > 6.2444286 = idf(content: mobil=4922 broadband=2290) 0.16014275 = > queryNorm > > 8.5223187E-4 = fieldWeight(content:"mobil broadband" in 14391), product > of: > > 4.472136 = tf(phraseFreq=20.0) 6.2444286 = idf(content: mobil=4922 > > broadband=2290) 3.0517578E-5 = fieldNorm(field=content, doc=14391) > > > > Remi > > > > On Wed, Jan 18, 2012 at 8:52 PM, Jan Høydahl <jan....@cominvent.com> > wrote: > > > >>> I've come accros a problem where newly indexed pages almost always come > >>> first even when the term frequency is relatively slow. > >> > >> There is no inherent index-time boost, so this must be something else. > >> Can you give us an example of a query? Which query parser do you use? > >> > >>> I read the posts below on "fieldNorm" and "omitNorms" but setting > >>> "omitNorms=true" doesn't change anything for me on the calculation of > >>> fieldNorm. > >> > >> Are you sure you have spelled omitNorms="true" correctly, then restarted > >> Solr (to refresh config)? The effect of Norms on your score will be that > >> shorter fields score higher than long fields. > >> > >> Perhaps you instead can try to tell us your use-case. What kind of > raning > >> are you trying to achieve? Then we can help suggest how to get there. > >> > >> -- > >> Jan Høydahl, search solution architect > >> Cominvent AS - www.cominvent.com > >> Solr Training - www.solrtraining.com > >