Hi Daniel, you can also consider using negative boosts. This can't be done with solr, but docs which don't match the metadata can be boosted.
This might do what you want : -metadata1:(term1 AND ... AND termN)^2 -metadata2:(term1 AND ... AND termN)^2 ..... -metadataN:(term1 AND ... AND termN)^2 allMetadatas :(term1 AND ... AND termN)^0.5 Franck Brisbart Le mercredi 22 janvier 2014 à 19:38 +0000, Petersen, Robert a écrit : > Hi Daniel, > > How about trying something like this (you'll have to play with the boosts to > tune this), search all the fields with all the terms using edismax and use > the minimum should match parameter, but require all terms to match in the > allMetadata field. > https://wiki.apache.org/solr/ExtendedDisMax#mm_.28Minimum_.27Should.27_Match.29 > > Lucene query syntax below to give you the general idea, but this query would > require all terms to be in one of the metadata fields to get the boost. > > metadata1:(term1 AND ... AND termN)^2 > metadata2:(term1 AND ... AND termN)^2 > ..... > metadataN:(term1 AND ... AND termN)^2 > allMetadatas :(term1 AND ... AND termN)^0.5 > > That should do approximately what you want, > Robi > > -----Original Message----- > From: Daniel Shane [mailto:sha...@lexum.com] > Sent: Tuesday, January 21, 2014 8:42 AM > To: solr-user@lucene.apache.org > Subject: Interesting search question! How to match documents based on the > least number of fields that match all query terms? > > I have an interesting solr/lucene question and its quite possible that some > new features in solr might make this much easier that what I am about to try. > If anyone has a clever idea on how to do this search, please let me know! > > Basically, lets state that I have an index in which each documents has a > content and several metadata fields. > > Document Fields: > > content > metadata1 > metadata2 > ..... > metadataN > allMetadatas (all the terms indexed in metadata1...N are concatenated in this > field) > > Assuming that I am searching for documents that contains a certain number of > terms (term1 to termN) in their metadata fields, I would like to build a > search query that will return document that satisfy these requirement: > > a) All search terms must be present in a metadata field. This is quite easy, > we can simply search in the field allMetadatas and that will work fine. > > b) Now for the hard part, we prefer document in which we found the metadatas > in the *least number of different fields*. So if one document contains all > the search terms in 10 different fields, but another document contains all > search terms but in only 8 fields, we would like those to sort first. > > My first idea was to index terms in the allMetadatas using payloads. Each > indexed term would also have the specific metadataN field from which they > originate. Then I can write a scorer to score based on these payloads. > > However, if there is a way to do this without payloads I'm all ears! >