You can use a separate field for title aliases. That is what I did for Netflix search.
Why disable idf? Disabling tf for titles can be a good idea, for example the movie “New York, New York” is not twice as much about New York as some other film that just lists it once. Also, consider using a popularity score as a boost. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jan 31, 2018, at 4:38 AM, Sravan Kumar <sra...@caavo.com> wrote: > > Hi, > We are using solr for our movie title search. > > > As it is "title search", this should be treated different than the normal > document search. > Hence, we use a modified version of TFIDFSimilarity with the following > changes. > - disabled TF & IDF and will only have 1 as value. > - disabled norms by specifying omitNorms as true for all the fields. > > There are 6 fields with different analyzers and we make use of different > weights in edismax's qf & pf parameters to match tokens & boost phrases. > > But, movies could have aliases and have multiple titles. So, we made the > fields multivalued. > > Now, consider the following four documents > 1> "Beauty and the Beast" > 2> "The Real Beauty and the Beast" > 3> "Beauty and the Beast", "La bella y la bestia" > 4> "Beauty and the Beast" > > Note: Document 3 has two titles in it. > > So, for a query "Beauty and the Beast" and with the above configuration all > the documents receive same score. But 1,3,4 should have got same score and > document 2 lesser than others. > > To solve this, we followed what is suggested in the following thread: > http://lucene.472066.n3.nabble.com/Influencing-scores-on-values-in-multiValue-fields-td1791651.html > > Now, the fields which are used to boost are made to use Norms. And for > matching norms are disabled. This is to make sure that exact & near exact > matches are rewarded. > > But, for the same query, we get the following results. > query: "Beauty & the Beast" > Search Results: > 1> "Beauty and the Beast" > 4> "Beauty and the Beast" > 2> "The Real Beauty and the Beast" > 3> "Beauty and the Beast", "La bella y la bestia" > > Clearly, the changes have solved only a part of the problem. The document 3 > should be ranked/scored higher than document 2. > > This is because lucene considers the total field length across all the > values in a multivalued field for normalization. > > How do we handle this scenario and make sure that in multivalued fields the > normalization is taken care of? > > > -- > Regards, > Sravan