subject:"Re\: Queries for many terms"

Re: Queries for many terms

2015-11-03 Thread Alan Woodward

TermsQuery works by pulling the postings lists for each term and OR-ing them together to create a bitset, which is very memory-efficient but means that you don't know at doc collection time which term has actually matched. For your case you probably want to create a SpanOrQuery, and then iterate

Re: Queries for many terms

2015-11-02 Thread Upayavira

Let's say we're trying to do document to document matching (not with MLT). We have a shingling analysis chain. The query is a document, which is itself shingled. We then look up those shingles in the index. The % of shingles found is in some sense a marker as to the extent to which the documents ar

Re: Queries for many terms

2015-11-02 Thread Erick Erickson

Or a really simple--minded approach, just use the frequency as a ration of numFound to estimate terms. Doesn't work of course if you need precise counts. On Mon, Nov 2, 2015 at 9:50 AM, Doug Turnbull wrote: > How precise do you need to be? > > I wonder if you could efficiently approximate "numbe

Re: Queries for many terms

2015-11-02 Thread Doug Turnbull

How precise do you need to be? I wonder if you could efficiently approximate "number of matches" by getting the document frequency of each term. I realize this is an approximation, but the highest document frequency would be your floor. Let's say you have terms t1, t2, and t3 ... tn. t1 has highe

Re: Queries for many terms

Re: Queries for many terms

Re: Queries for many terms

Re: Queries for many terms

4 matches

Site Navigation

Mail list logo

Footer information