: > However, I am searching for a solution that does something like: "this is my
: > query" and the document has to consist of this query plus maximal - for
: > example - two another terms?
...
: Not quite following. It sounds like you are saying you want to favor
: docs that are shorter, while still maximizing the number of terms that
: match, right?
I'm pretty sure he's looking for more then what Similarity can provide
w/lengthNorms -- note that he specificly wnats to eliminate matches that
contain more then X additional terms besides what's included in the query.
(so the doc "how now brown sexy cow" would match a query for q=how+cow&x=3
but it would not match a query for q=how+cow&x=2 (because there are more
then 2 "left over" words in the document)
This sounds a lot like a usecase that a mentioned in my "Beyond The Box"
talk at ACUS2008...
http://people.apache.org/~hossman/apachecon2008us/btb/
...take a look at slides 32-35. The first approach is how the person I
spoke to (anonymous) actaully solved this problem for their company (note:
it was not actaully a movie title domain space, that's my own example) and
the second appraoch is an example of how i would have probably attempted
to tackle this problem. (Note: in hindsight, you can't have a gneric
numeric field with a tokenizer, so that "titleLen" field would need to be
a TextField and you'd have to use oldschool zero padding tricks to make
the range query qork problem -- but for this type of usecase the numbers
aren't likelye to ever be more then 100 anyway so it's not to heineous)
-Hoss