Using edismax with different fields for each title will affect the final
scores if the tie paramter is non-zero.
Can we create separate document for each title? The uniqueness won't be for
movie_id but for each title. In this manner, even while using edismax, the
other titles won't affect the scor
@Walter: Perhaps you are right on not to consider stemming. Instead fuzzy
search will cover these along with the misspellings.
In case of symbols, we want the titles matching the symbols ranked higher
than the others. Perhaps we can use this field only for boosting.
Certain movies have around 4-6
I was the first search engineer at Netflix and moved their search from a
home-grown engine to Solr. It worked very well with a single title field and
aliases.
I think your schema is too complicated for movie search.
Stemming is not useful. It doesn’t help search and it can hurt. You don’t want
@Tim Casey: Yeah... TFIDFSimilarity weighs towards shorter documents. This
is done through the fieldnorm component in the class. The issue is when the
field is multivalued. Consider the field has two string each of 4 tokens.
The fieldNorm from the lucene TFIDFSimilarity class considers the total su
For smaller length documents TFIDFSimilarity will weight towards shorter
documents. Another way to say this, if your documents are 5-10 terms, the
5 terms are going to win.
You might think about having per token, or token pair, weight. I would be
surprised if there was not something similar out t
@Walter: We have 6 fields declared in schema.xml for title each with different
type of analyzer. One without processing symbols, other stemmed and other
removing symbols, etc. So, if we have separate fields for each alias it will
be that many times the number of final fields declared in schema
Or use a boost for the phrase, something like
"beauty and the beast"^5
On Wed, Jan 31, 2018 at 8:43 AM, Walter Underwood wrote:
> You can use a separate field for title aliases. That is what I did for
> Netflix search.
>
> Why disable idf? Disabling tf for titles can be a good idea, for example
You can use a separate field for title aliases. That is what I did for Netflix
search.
Why disable idf? Disabling tf for titles can be a good idea, for example the
movie “New York, New York” is not twice as much about New York as some other
film that just lists it once.
Also, consider using a