Solr gives the same fieldnorm for two different-size fields

2014-07-31 Thread gorjida
I use solr for searching over a collection of institution names... My solr DB
contains multiple field names such as name, country, city,  A sample
document looks like this:

{
"solr_id": 130950,
"rg_id": 140239,
"rg_parent_id": 1438,
"name": "University of California Berkeley Research",
"ext_name": "",
"city": "Berkeley",
"country": "US",
"state": "CA",
"type": "academic/gen",
"ext_city": "",
"zip": "94720-5100",
"_version_": 1474909528315134000
  },

I need to search over this database... My query looks like this:

name: (university of california berkeley)

After running this query, top-2 matches are as follows:

{
"solr_id": 130950,
"rg_id": 140239,
"rg_parent_id": 1438,
"name": "University of California Berkeley Research",
"ext_name": "",
"city": "Berkeley",
"country": "US",
"state": "CA",
"type": "academic/gen",
"ext_city": "",
"zip": "94720-5100",
"_version_": 1474909528315134000,
"score": 1.8849033
  },
  {
"solr_id": 350,
"rg_id": 1438,
"rg_parent_id": 1439,
"name": "University of California Berkeley",
"ext_name": "",
"city": "Berkeley",
"country": "US",
"state": "CA",
"type": "academic",
"ext_city": "",
"zip": "94720",
"_version_": 1474909520371122200,
"score": 1.8849033
  },

Indeed, both "University of California Berkeley Research" and "University of
California Berkeley" get the same score (1.8849033)... FYI, my schema looks
like this:

fieldType name="text_general" class="solr.TextField" omitNorms="false"
autoGeneratePhraseQueries="true">
  



  
  



  


I also checked the debugger and noticed that both documents return the same
fieldnorm (.5)... The bizzare thing is that solr works fine for these
queries:
--- name: (university of toronto)
--- name: (university of california los angeles)

Indeed, it seems that solr fails once the number of tokens in the documents
is equal to "4"... For above queries, the first one (university of toronto)
has three tokens and the second one has 5 tokens... I am totally stuck at
this point why solr cannot provide different fieldnorms for (University of
California Berkeley) and (University of California Berkeley Research)...
Also, I do not understand why it just happens when I have 4 tokens in the
field? I would appreciate if anyone can share the feedback...

PS. I have also tested "solr.StopFilterFactory" ignoreCase="true" and the
problem is not still resolved...

Regards,

Ali



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-gives-the-same-fieldnorm-for-two-different-size-fields-tp4150418.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr gives the same fieldnorm for two different-size fields

2014-07-31 Thread gorjida
Thanks so much for your reply... In my case, it really matters because I am
going to find the correct institution match for an affiliation string... For
example, if an author belongs to the "university of Toronto", his/her
affiliation should be normalized against the solr... In this case,
"University of California Berkley Research" is a different place to
"university of california berkeley"... I see top-matches are tied in the
score for this specific example... I can break the tie using other
techniques... However, I am keen to see if this is a common problem in solr? 

Regards,

Ali  



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-gives-the-same-fieldnorm-for-two-different-size-fields-tp4150418p4150430.html
Sent from the Solr - User mailing list archive at Nabble.com.