Hi, This use case is similar to matching boolean expression problem. You can find recent thread about it. I have an idea that we can introduce disjunction query with dynamic mm (minShouldMatch parameter http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/api/all/org/apache/lucene/search/BooleanQuery.html#setMinimumNumberShouldMatch(int)) i.e. 'match these clauses disjunctively but for every document use value from field cache of field xxxCount as a minShouldMatch parameter'. Also norms can be used as a source for dynamics mm values.
Wdyt? On Wed, Apr 11, 2012 at 10:08 AM, Li Li <fancye...@gmail.com> wrote: > it's not possible now because lucene don't support this. > when doing disjunction query, it only record how many terms match this > document. > I think this is a common requirement for many users. > I suggest lucene should divide scorer to a matcher and a scorer. > the matcher just return which doc is matched and why/how the doc is > matched. > especially for disjuction query, it should tell which term matches and > possible other > information such as tf/idf and the distance of terms(to support proximity > search). > That's the matcher's job. and then the scorer(a ranking algorithm) use > flexible algorithm > to score this document and the collector can collect it. > > On Wed, Apr 11, 2012 at 10:28 AM, Chris Book <chrisb...@gmail.com> wrote: > > > Hello, I have a solr index running that is working very well as a search. > > But I want to add the ability (if possible) to use it to do matching. > The > > problem is that by default it is only looking for all the input terms to > be > > present, and it doesn't give me any indication as to how many terms in > the > > target field were not specified by the input. > > > > For example, if I'm trying to match to the song title "dust in the wind", > > I'm correctly getting a match if the input query is "dust in wind". But > I > > don't want to get a match if the input is just "dust". Although as a > > search "dust" should return this result, I'm looking for some way to > filter > > this out based on some indication that the input isn't close enough to > the > > output. Perhaps if I could get information that that the number of input > > terms is much less than the number of terms in the field. Or something > > else along those line? > > > > I realize that this isn't the typical use case for a search, but I'm just > > looking for some suggestions as to how I could improve the above example > a > > bit. > > > > Thanks, > > Chris > > > -- Sincerely yours Mikhail Khludnev ge...@yandex.ru <http://www.griddynamics.com> <mkhlud...@griddynamics.com>