: > However, I am searching for a solution that does something like: "this is my
: > query" and the document has to consist of this query plus maximal - for
: > example - two another terms?
        ...
: Not quite following.  It sounds like you are saying you want to favor 
: docs that are shorter, while still maximizing the number of terms that 
: match, right?

I'm pretty sure he's looking for more then what Similarity can provide 
w/lengthNorms -- note that he specificly wnats to eliminate matches that 
contain more then X additional terms besides what's included in the query.  
(so the doc "how now brown sexy cow" would match a query for q=how+cow&x=3 
but it would not match a query for q=how+cow&x=2 (because there are more 
then 2 "left over" words in the document)

This sounds a lot like a usecase that a mentioned in my "Beyond The Box" 
talk at ACUS2008...
   http://people.apache.org/~hossman/apachecon2008us/btb/

...take a look at slides 32-35.  The first approach is how the person I 
spoke to (anonymous) actaully solved this problem for their company (note: 
it was not actaully a movie title domain space, that's my own example) and 
the second appraoch is an example of how i would have probably attempted 
to tackle this problem.  (Note: in hindsight, you can't have a gneric 
numeric field with a tokenizer, so that "titleLen" field would need to be 
a TextField and you'd have to use oldschool zero padding tricks to make 
the range query qork problem -- but for this type of usecase the numbers 
aren't likelye to ever be more then 100 anyway so it's not to heineous)


-Hoss

Reply via email to