Hi, We are using solr for our movie title search.
As it is "title search", this should be treated different than the normal document search. Hence, we use a modified version of TFIDFSimilarity with the following changes. - disabled TF & IDF and will only have 1 as value. - disabled norms by specifying omitNorms as true for all the fields. There are 6 fields with different analyzers and we make use of different weights in edismax's qf & pf parameters to match tokens & boost phrases. But, movies could have aliases and have multiple titles. So, we made the fields multivalued. Now, consider the following four documents 1> "Beauty and the Beast" 2> "The Real Beauty and the Beast" 3> "Beauty and the Beast", "La bella y la bestia" 4> "Beauty and the Beast" Note: Document 3 has two titles in it. So, for a query "Beauty and the Beast" and with the above configuration all the documents receive same score. But 1,3,4 should have got same score and document 2 lesser than others. To solve this, we followed what is suggested in the following thread: http://lucene.472066.n3.nabble.com/Influencing-scores-on-values-in-multiValue-fields-td1791651.html Now, the fields which are used to boost are made to use Norms. And for matching norms are disabled. This is to make sure that exact & near exact matches are rewarded. But, for the same query, we get the following results. query: "Beauty & the Beast" Search Results: 1> "Beauty and the Beast" 4> "Beauty and the Beast" 2> "The Real Beauty and the Beast" 3> "Beauty and the Beast", "La bella y la bestia" Clearly, the changes have solved only a part of the problem. The document 3 should be ranked/scored higher than document 2. This is because lucene considers the total field length across all the values in a multivalued field for normalization. How do we handle this scenario and make sure that in multivalued fields the normalization is taken care of? -- Regards, Sravan