: > However, I am searching for a solution that does something like: "this is my : > query" and the document has to consist of this query plus maximal - for : > example - two another terms? ... : Not quite following. It sounds like you are saying you want to favor : docs that are shorter, while still maximizing the number of terms that : match, right?
I'm pretty sure he's looking for more then what Similarity can provide w/lengthNorms -- note that he specificly wnats to eliminate matches that contain more then X additional terms besides what's included in the query. (so the doc "how now brown sexy cow" would match a query for q=how+cow&x=3 but it would not match a query for q=how+cow&x=2 (because there are more then 2 "left over" words in the document) This sounds a lot like a usecase that a mentioned in my "Beyond The Box" talk at ACUS2008... http://people.apache.org/~hossman/apachecon2008us/btb/ ...take a look at slides 32-35. The first approach is how the person I spoke to (anonymous) actaully solved this problem for their company (note: it was not actaully a movie title domain space, that's my own example) and the second appraoch is an example of how i would have probably attempted to tackle this problem. (Note: in hindsight, you can't have a gneric numeric field with a tokenizer, so that "titleLen" field would need to be a TextField and you'd have to use oldschool zero padding tricks to make the range query qork problem -- but for this type of usecase the numbers aren't likelye to ever be more then 100 anyway so it's not to heineous) -Hoss