> Hi
> 
> How do I restrict hits to documents containing all words
> (regardless of order) of a query in particular field?
> 
> Suppose I have two documents with a field called name in my
> index:
> 
> doc1 => name: Pink
> doc2 => name: Pink Floyd
> 
> When querying for "Pink" I want only doc1 and when querying
> for "Pink Floyd" or "Floyd Pink" I want doc2.
> 
> Thanks
> 
> - Magnus


I would implement this kind of functionality by preprocessing documents and 
queries to calculate number of unique terms in each document and query before I 
sent them to solr. I would add an extra integer field to hold that number.

For example when indexing document

doc1 => 
name: Pink  
numberOfuniqueTerms: 1

doc2 => 
name: Pink Floyd 
numberOfuniqueTerms: 2

You will set query parser's default operator to AND, that will guarantee  that 
all query terms will appear in returned document. And numberOfuniqueTerms 
criteria will guarantee that returned document does not contain any additional 
terms.

query: pink will be expanded as => name:Pink AND numberOfuniqueTerms:1
query: Pink Floyd will be expanded as  => name:(Pink AND Floyd) AND 
numberOfuniqueTerms:2


Your preporecessor program can use Lucene API, TermVectors. Since you are 
interested only size of it

TermFreqVector nameTV = indexSearcher.getIndexReader().getTermFreqVector(docId, 
"name");
numberOfuniqueTerms = nameTV.size() 

should give you that number.

But this requires pre-indexing a document in Lucene using the same analyzer 
defined in schema.xml - just to get number of unique terms in it -
Obviously it is not the best solution. And you must use JAVA.


The second solution can be: (without pre-processing and without adding integer 
field)


Since storing term vectors at index time, allows you to access termvectors at 
query time there should be easier way [TermVectorComponent] to access a 
returned document's term vector size, but i do not know how to query that size.

http://wiki.apache.org/solr/TermVectorComponent will give you unique terms in a 
particular field of a returned document, but you will need to iterate that list 
to check if it contains all query terms and nothing else. 


Hope this helps.


      

Reply via email to