And it won't be <G>. Basically, the norms are an approximation (They used to be just a byte long), so fields of "close" lengths will have the same value here.
Why is this an issue? If you back up a second, is a word appearing in a 4-word field really "enough" more important than one appearing in a 5 word field to require a distinction? Lately you can specify field norms that are longer than a byte, but the overall problem still remains. Frankly, though, I think this is something that's a distraction and that users won't notice. FWIW, Erick On Thu, Jul 31, 2014 at 9:56 AM, gorjida <a...@sciencescape.net> wrote: > I use solr for searching over a collection of institution names... My solr > DB > contains multiple field names such as name, country, city, .... A sample > document looks like this: > > { > "solr_id": 130950, > "rg_id": 140239, > "rg_parent_id": 1438, > "name": "University of California Berkeley Research", > "ext_name": "", > "city": "Berkeley", > "country": "US", > "state": "CA", > "type": "academic/gen", > "ext_city": "", > "zip": "94720-5100", > "_version_": 1474909528315134000 > }, > > I need to search over this database... My query looks like this: > > name: (university of california berkeley) > > After running this query, top-2 matches are as follows: > > { > "solr_id": 130950, > "rg_id": 140239, > "rg_parent_id": 1438, > "name": "University of California Berkeley Research", > "ext_name": "", > "city": "Berkeley", > "country": "US", > "state": "CA", > "type": "academic/gen", > "ext_city": "", > "zip": "94720-5100", > "_version_": 1474909528315134000, > "score": 1.8849033 > }, > { > "solr_id": 350, > "rg_id": 1438, > "rg_parent_id": 1439, > "name": "University of California Berkeley", > "ext_name": "", > "city": "Berkeley", > "country": "US", > "state": "CA", > "type": "academic", > "ext_city": "", > "zip": "94720", > "_version_": 1474909520371122200, > "score": 1.8849033 > }, > > Indeed, both "University of California Berkeley Research" and "University > of > California Berkeley" get the same score (1.8849033)... FYI, my schema looks > like this: > > fieldType name="text_general" class="solr.TextField" omitNorms="false" > autoGeneratePhraseQueries="true"> > <analyzer type="index"> > <tokenizer class="solr.StandardTokenizerFactory"/> > <filter class="solr.StopFilterFactory" ignoreCase="false"/> > <filter class="solr.LowerCaseFilterFactory"/> > </analyzer> > <analyzer type="query"> > <tokenizer class="solr.StandardTokenizerFactory"/> > <filter class="solr.StopFilterFactory" ignoreCase="false"/> > <filter class="solr.LowerCaseFilterFactory"/> > </analyzer> > </fieldType> > > I also checked the debugger and noticed that both documents return the same > fieldnorm (.5)... The bizzare thing is that solr works fine for these > queries: > --- name: (university of toronto) > --- name: (university of california los angeles) > > Indeed, it seems that solr fails once the number of tokens in the documents > is equal to "4"... For above queries, the first one (university of toronto) > has three tokens and the second one has 5 tokens... I am totally stuck at > this point why solr cannot provide different fieldnorms for (University of > California Berkeley) and (University of California Berkeley Research)... > Also, I do not understand why it just happens when I have 4 tokens in the > field? I would appreciate if anyone can share the feedback... > > PS. I have also tested "solr.StopFilterFactory" ignoreCase="true" and the > problem is not still resolved... > > Regards, > > Ali > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Solr-gives-the-same-fieldnorm-for-two-different-size-fields-tp4150418.html > Sent from the Solr - User mailing list archive at Nabble.com. >