Multiple custom Similarity implementations

2016-03-07 Thread Parvesh Garg
Hi,

We have a requirement where we want to run an A/B test over multiple
Similarity implementations. Is it possible to define multiple similarity
tags in schema.xml file and chose one using the URL parameter? We are using
solr 4.7

Currently, we are planning to have different cores with different
similarity configured and split traffic based on core names. This is
leading to index duplication and un-necessary resource usage.

Any help is highly appreciated.

Parvesh Garg,

http://www.zettata.com


Re: Multiple custom Similarity implementations

2016-03-09 Thread Parvesh Garg
Thanks Markus. We will look at other options. May I ask what can be the
reasons for not supporting this ever?


Parvesh Garg,

http://www.zettata.com

On Tue, Mar 8, 2016 at 8:59 PM, Markus Jelsma 
wrote:

> Hello, you can not change similarities per request, and this is likely
> never going to be supported for good reasons. You need multiple cores, or
> multiple fields with different similarity defined in the same core.
> Markus
>
> -Original message-
> > From:Parvesh Garg 
> > Sent: Tuesday 8th March 2016 5:36
> > To: solr-user@lucene.apache.org
> > Subject: Multiple custom Similarity implementations
> >
> > Hi,
> >
> > We have a requirement where we want to run an A/B test over multiple
> > Similarity implementations. Is it possible to define multiple similarity
> > tags in schema.xml file and chose one using the URL parameter? We are
> using
> > solr 4.7
> >
> > Currently, we are planning to have different cores with different
> > similarity configured and split traffic based on core names. This is
> > leading to index duplication and un-necessary resource usage.
> >
> > Any help is highly appreciated.
> >
> > Parvesh Garg,
> >
> > http://www.zettata.com
> >
>


Re: Multiple custom Similarity implementations

2016-03-10 Thread Parvesh Garg
Hi Ahmet,

Thanks for the pointer. I have similar thoughts on the subject. The risk
assumptions are based on not testing your stuff before taking it in. That
risk is still valid with similarity configuration. And sometimes, it may
not be possible to use multiple similarities (custom or otherwise). But
overall, it seems like a nice feature to have.



Parvesh Garg,
Head of Engineering

http://www.zettata.com

On Thu, Mar 10, 2016 at 3:05 PM, Ahmet Arslan 
wrote:

> Hi Parvesh,
>
> Please see the similar discussion :
> http://search-lucene.com/m/eHNlijx91I7etm1
>
> Ahmet
>
>
> On Thursday, March 10, 2016 6:57 AM, Parvesh Garg 
> wrote:
>
>
>
> Thanks Markus. We will look at other options. May I ask what can be the
> reasons for not supporting this ever?
>
>
> Parvesh Garg,
>
> http://www.zettata.com
>
>
> On Tue, Mar 8, 2016 at 8:59 PM, Markus Jelsma 
> wrote:
>
> > Hello, you can not change similarities per request, and this is likely
> > never going to be supported for good reasons. You need multiple cores, or
> > multiple fields with different similarity defined in the same core.
> > Markus
> >
> > -Original message-
> > > From:Parvesh Garg 
> > > Sent: Tuesday 8th March 2016 5:36
> > > To: solr-user@lucene.apache.org
> > > Subject: Multiple custom Similarity implementations
> > >
> > > Hi,
> > >
> > > We have a requirement where we want to run an A/B test over multiple
> > > Similarity implementations. Is it possible to define multiple
> similarity
> > > tags in schema.xml file and chose one using the URL parameter? We are
> > using
> > > solr 4.7
> > >
> > > Currently, we are planning to have different cores with different
> > > similarity configured and split traffic based on core names. This is
> > > leading to index duplication and un-necessary resource usage.
> > >
> > > Any help is highly appreciated.
> > >
> > > Parvesh Garg,
> > >
> > > http://www.zettata.com
> > >
> >
>


Difference between CustomScoreQuery and RankQuery

2015-09-16 Thread Parvesh Garg
Hi All,

I wanted to understand the difference between CustomScoreQuery and
RankQuery. From the outside, it seems they do the same thing with RankQuery
having more functionality.

Am I missing something?


Parvesh Garg


utility methods to get field values from index

2015-05-12 Thread Parvesh Garg
Hi All,

Was wondering if there is any class in Solr that provides utility methods
to fetch indexed field values for documents using docId. Something simple
like

getMultiLong(String field, int docId)

getLong(String field, int docId)

We have written a solr component to return group level stats like avg
score, max score etc over a large number of documents (say 5000+) against a
query executed using edismax. Need to get the group id fields value to do
that, this is a single valued long field.

This component also looks at one more field that is a multivalued long
field for each document and compute a score based on frequency + document
score for each value.

Currently we are using stored fields and was wondering if this approach
would be faster.

Apologies if this is too much to ask for.

Parvesh Garg,


Re: utility methods to get field values from index

2015-05-13 Thread Parvesh Garg
Hi Shalin,

Thanks for your answer. Forgot to mention that we are using 4.10 solr.
Also, I tried using docValues and the performance was worse than getting it
from stored values. Time taken to retrieve data for 2000 docs  for 2 fields
was 120 ms vs 230 ms previously and for docValues respectively.

May be there is something wrong in my code.

The code used for retrieving docValues is:

  *public* *static* *long* getSingleLong(*SolrIndexSearcher* searcher, *int*
docId,

  *String* field) *throws* IOException {


*NumericDocValues* sdv = *DocValues*.*getNumeric*
(searcher.getAtomicReader(),

field);


*return* sdv.get(docId);

  }

and

  *public* *static* *List* getMultiLong(*SolrIndexSearcher* searcher,

  *int* docId, *String* field) *throws* IOException {

*SortedSetDocValues* ssdv = *DocValues*.*getSortedSet*(

searcher.getAtomicReader(), field);


ssdv.setDocument(docId);

*long* l;

*List* retval = *new* *ArrayList*(40);


*while* ((l = ssdv.nextOrd()) != *SortedSetDocValues*.*NO_MORE_ORDS*) {

  *BytesRef* bytes = ssdv.lookupOrd(l);

  retval.add(*NumericUtils*.*prefixCodedToLong*(bytes));

}


*return* retval;

  }



Parvesh Garg

On Wed, May 13, 2015 at 11:36 AM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> In Solr 5.0+ you can use Lucene's DocValues API to read the indexed
> information. This is a unifying API over field cache and doc values so it
> can be used on all indexed fields.
>
> e.g. for single-valued field use
> searcher.getLeafReader().getSortedDocValues(fieldName);
> and for multi-valued fields
> use searcher.getLeafReader().getSortedSetDocValues(fieldName);
>
> On Wed, May 13, 2015 at 11:11 AM, Parvesh Garg 
> wrote:
>
> > Hi All,
> >
> > Was wondering if there is any class in Solr that provides utility methods
> > to fetch indexed field values for documents using docId. Something simple
> > like
> >
> > getMultiLong(String field, int docId)
> >
> > getLong(String field, int docId)
> >
> > We have written a solr component to return group level stats like avg
> > score, max score etc over a large number of documents (say 5000+)
> against a
> > query executed using edismax. Need to get the group id fields value to do
> > that, this is a single valued long field.
> >
> > This component also looks at one more field that is a multivalued long
> > field for each document and compute a score based on frequency + document
> > score for each value.
> >
> > Currently we are using stored fields and was wondering if this approach
> > would be faster.
> >
> > Apologies if this is too much to ask for.
> >
> > Parvesh Garg,
> >
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>


Compound words

2013-10-28 Thread Parvesh Garg
Hi,

I'm an infant in Solr/Lucene family, just a couple of months old.

We are trying to find a way to combine words into a single compound word at
index and query time. E.g. if the document has "sea bird" in it, it should
be indexed as seabird and any query having sea bird in it should also look
for seabird not only in qf but also in pf, pf2, pf3 fields. Well, we are
using edismax query parser.

Our problem is not at index time, we have achieved it by writing our own
token filter, but at query time. Our token filter takes a dictionary in the
form of "prefix,suffix" in the file and keeps emitting regular and compound
tokens as it encounters them.

We configured our own filter at query time but figured that at query time
individual clauses like field:sea , field:bird etc are created first and
then sent to the analyzer. First of all, can someone please confirm if this
part of my understanding is correct? So, we are forced to emit sea and bird
as individual tokens because we are not getting them in sequence at all.

Is it possible to achieve this by other means than pre-processing query
before sending it to solr? Can a CharFilter be used instead, are they
applied before creating query clauses?

I can keep providing more details as necessary. This mail has already
crossed TL;DR limits for many :)

Parvesh Garg
http://www.zettata.com
+91 963 222 5540


Re: Compound words

2013-10-28 Thread Parvesh Garg
One more thing, Is there a way to remove my "accidentally sent phone number
in the signature" from the previous mail? aarrrggghhh


Re: Compound words

2013-10-28 Thread Parvesh Garg
Hi Erick,

Thanks for the suggestion. Like I said, I'm an infant.

We tried synonyms both ways. sea biscuit => seabiscuit and seabiscuit =>
sea biscuit and didn't understand exactly how it worked. But I just checked
the analysis tool, and it seems to work perfectly fine at index time. Now,
I can happily discard my own filter and 4 days of work. I'm happy I got to
know a few ways on how/when not to write a solr filter :)

I tried the string "sea biscuit sea bird" with expand=false and the tokens
i got were seabiscuit sea bird at 1,2 and 3 positions respectively. But at
query time, when I enter the same term "sea biscuit sea bird", using
edismax and qf, pf2, and pf3, the parsedQuery looks like this:

+((text:sea) (text:biscuit) (text:sea) (text:bird)) ((text:\"biscuit sea\")
(text:\"sea bird\")) ((text:\"seabiscuit sea\") (text:\"biscuit sea
bird\"))"

What I wanted instead was this

"+((text:seabiscuit) (text:sea) (text:bird)) ((text:\"seabiscuit sea\")
(text:\"sea bird\")) (text:\"seabiscuit sea bird\")"

Looks like there isn't any other way than to pre-process query myself and
create the compound word. What do you mean by "just query the raw string"?
Am I still missing something?

Parvesh Garg
http://www.zettata.com
(This time I did remove my phone number :) )

On Mon, Oct 28, 2013 at 4:14 PM, Erick Erickson wrote:

> Why did you reject using synonyms? You can have multi-word
> synonyms just fine at index time, and at query time, since the
> multiple words are already substituted in the index you don't
> need to do the same substitution, just query the raw strings.
>
> I freely acknowledge you may have very good reasons for doing
> this yourself, I'm just making sure you know what's already
> there.
>
> See:
>
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory
>
> Look particularly at the explanations for "sea biscuit" in that section.
>
> Best,
> Erick
>
>
>
> On Mon, Oct 28, 2013 at 3:47 AM, Parvesh Garg  wrote:
>
> > One more thing, Is there a way to remove my "accidentally sent phone
> number
> > in the signature" from the previous mail? aarrrggghhh
> >
>


Re: Compound words

2013-10-28 Thread Parvesh Garg
Hi Roman, thanks for the link, will go through it.

Erick, will try with expand=true once and check out the results. Will
update this thread with the findings. I remember we rejected expand=true
because of some weird spaghetti problem. Will check it out again.

Thanks,

Parvesh Garg
http://www.zettata.com


On Mon, Oct 28, 2013 at 9:01 PM, Roman Chyla  wrote:

> Hi Parvesh,
> I think you should check the following jira
> https://issues.apache.org/jira/browse/SOLR-5379. You will find there links
> to other possible solutions/problems:-)
> Roman
> On 28 Oct 2013 09:06, "Erick Erickson"  wrote:
>
> > Consider setting expand=true at index time. That
> > puts all the tokens in your index, and then you
> > may not need to have any synonym
> > processing at query time since all the variants will
> > already be in the index.
> >
> > As it is, you've replaced the words in the original with
> > synonyms, essentially collapsed them down to a single
> > word and then you have to do something at query time
> > to get matches. If all the variants are in the index, you
> > shouldn't have to. That's what I meant by "raw".
> >
> > Best,
> > Erick
> >
> >
> > On Mon, Oct 28, 2013 at 8:02 AM, Parvesh Garg 
> wrote:
> >
> > > Hi Erick,
> > >
> > > Thanks for the suggestion. Like I said, I'm an infant.
> > >
> > > We tried synonyms both ways. sea biscuit => seabiscuit and seabiscuit
> =>
> > > sea biscuit and didn't understand exactly how it worked. But I just
> > checked
> > > the analysis tool, and it seems to work perfectly fine at index time.
> > Now,
> > > I can happily discard my own filter and 4 days of work. I'm happy I got
> > to
> > > know a few ways on how/when not to write a solr filter :)
> > >
> > > I tried the string "sea biscuit sea bird" with expand=false and the
> > tokens
> > > i got were seabiscuit sea bird at 1,2 and 3 positions respectively. But
> > at
> > > query time, when I enter the same term "sea biscuit sea bird", using
> > > edismax and qf, pf2, and pf3, the parsedQuery looks like this:
> > >
> > > +((text:sea) (text:biscuit) (text:sea) (text:bird)) ((text:\"biscuit
> > sea\")
> > > (text:\"sea bird\")) ((text:\"seabiscuit sea\") (text:\"biscuit sea
> > > bird\"))"
> > >
> > > What I wanted instead was this
> > >
> > > "+((text:seabiscuit) (text:sea) (text:bird)) ((text:\"seabiscuit sea\")
> > > (text:\"sea bird\")) (text:\"seabiscuit sea bird\")"
> > >
> > > Looks like there isn't any other way than to pre-process query myself
> and
> > > create the compound word. What do you mean by "just query the raw
> > string"?
> > > Am I still missing something?
> > >
> > > Parvesh Garg
> > > http://www.zettata.com
> > > (This time I did remove my phone number :) )
> > >
> > > On Mon, Oct 28, 2013 at 4:14 PM, Erick Erickson <
> erickerick...@gmail.com
> > > >wrote:
> > >
> > > > Why did you reject using synonyms? You can have multi-word
> > > > synonyms just fine at index time, and at query time, since the
> > > > multiple words are already substituted in the index you don't
> > > > need to do the same substitution, just query the raw strings.
> > > >
> > > > I freely acknowledge you may have very good reasons for doing
> > > > this yourself, I'm just making sure you know what's already
> > > > there.
> > > >
> > > > See:
> > > >
> > > >
> > >
> >
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory
> > > >
> > > > Look particularly at the explanations for "sea biscuit" in that
> > section.
> > > >
> > > > Best,
> > > > Erick
> > > >
> > > >
> > > >
> > > > On Mon, Oct 28, 2013 at 3:47 AM, Parvesh Garg 
> > > wrote:
> > > >
> > > > > One more thing, Is there a way to remove my "accidentally sent
> phone
> > > > number
> > > > > in the signature" from the previous mail? aarrrggghhh
> > > > >
> > > >
> > >
> >
>


Re: Compound words

2013-10-29 Thread Parvesh Garg
Hi Erick,

I tried with expand=true and got exactly the same tokens i.e., seabiscuit
sea bird at 1,2 and 3 positions respectively. As per solr documentation at
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory,
explicit mappings ignore the expand parameter in the schema.

So, the problem of creating compound problems at query time remains.


Parvesh Garg
http://www.zettata.com


On Tue, Oct 29, 2013 at 2:11 AM, Parvesh Garg  wrote:

> Hi Roman, thanks for the link, will go through it.
>
> Erick, will try with expand=true once and check out the results. Will
> update this thread with the findings. I remember we rejected expand=true
> because of some weird spaghetti problem. Will check it out again.
>
> Thanks,
>
> Parvesh Garg
> http://www.zettata.com
>
>
> On Mon, Oct 28, 2013 at 9:01 PM, Roman Chyla wrote:
>
>> Hi Parvesh,
>> I think you should check the following jira
>> https://issues.apache.org/jira/browse/SOLR-5379. You will find there
>> links
>> to other possible solutions/problems:-)
>> Roman
>> On 28 Oct 2013 09:06, "Erick Erickson"  wrote:
>>
>> > Consider setting expand=true at index time. That
>> > puts all the tokens in your index, and then you
>> > may not need to have any synonym
>> > processing at query time since all the variants will
>> > already be in the index.
>> >
>> > As it is, you've replaced the words in the original with
>> > synonyms, essentially collapsed them down to a single
>> > word and then you have to do something at query time
>> > to get matches. If all the variants are in the index, you
>> > shouldn't have to. That's what I meant by "raw".
>> >
>> > Best,
>> > Erick
>> >
>> >
>> > On Mon, Oct 28, 2013 at 8:02 AM, Parvesh Garg 
>> wrote:
>> >
>> > > Hi Erick,
>> > >
>> > > Thanks for the suggestion. Like I said, I'm an infant.
>> > >
>> > > We tried synonyms both ways. sea biscuit => seabiscuit and seabiscuit
>> =>
>> > > sea biscuit and didn't understand exactly how it worked. But I just
>> > checked
>> > > the analysis tool, and it seems to work perfectly fine at index time.
>> > Now,
>> > > I can happily discard my own filter and 4 days of work. I'm happy I
>> got
>> > to
>> > > know a few ways on how/when not to write a solr filter :)
>> > >
>> > > I tried the string "sea biscuit sea bird" with expand=false and the
>> > tokens
>> > > i got were seabiscuit sea bird at 1,2 and 3 positions respectively.
>> But
>> > at
>> > > query time, when I enter the same term "sea biscuit sea bird", using
>> > > edismax and qf, pf2, and pf3, the parsedQuery looks like this:
>> > >
>> > > +((text:sea) (text:biscuit) (text:sea) (text:bird)) ((text:\"biscuit
>> > sea\")
>> > > (text:\"sea bird\")) ((text:\"seabiscuit sea\") (text:\"biscuit sea
>> > > bird\"))"
>> > >
>> > > What I wanted instead was this
>> > >
>> > > "+((text:seabiscuit) (text:sea) (text:bird)) ((text:\"seabiscuit
>> sea\")
>> > > (text:\"sea bird\")) (text:\"seabiscuit sea bird\")"
>> > >
>> > > Looks like there isn't any other way than to pre-process query myself
>> and
>> > > create the compound word. What do you mean by "just query the raw
>> > string"?
>> > > Am I still missing something?
>> > >
>> > > Parvesh Garg
>> > > http://www.zettata.com
>> > > (This time I did remove my phone number :) )
>> > >
>> > > On Mon, Oct 28, 2013 at 4:14 PM, Erick Erickson <
>> erickerick...@gmail.com
>> > > >wrote:
>> > >
>> > > > Why did you reject using synonyms? You can have multi-word
>> > > > synonyms just fine at index time, and at query time, since the
>> > > > multiple words are already substituted in the index you don't
>> > > > need to do the same substitution, just query the raw strings.
>> > > >
>> > > > I freely acknowledge you may have very good reasons for doing
>> > > > this yourself, I'm just making sure you know what's already
>> > > > there.
>> > > >
>> > > > See:
>> > > >
>> > > >
>> > >
>> >
>> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory
>> > > >
>> > > > Look particularly at the explanations for "sea biscuit" in that
>> > section.
>> > > >
>> > > > Best,
>> > > > Erick
>> > > >
>> > > >
>> > > >
>> > > > On Mon, Oct 28, 2013 at 3:47 AM, Parvesh Garg 
>> > > wrote:
>> > > >
>> > > > > One more thing, Is there a way to remove my "accidentally sent
>> phone
>> > > > number
>> > > > > in the signature" from the previous mail? aarrrggghhh
>> > > > >
>> > > >
>> > >
>> >
>>
>
>


custom group sort in solr

2013-12-12 Thread Parvesh Garg
Hi,

I want to use solr/lucene's grouping feature with a some customisations
like

   - sorting the groups based on average scores instead of max scores or
   some other complex computation over scores.
   - group articles based on some computation instead of a field value.

So far it seems like I have to write some code for it. Can someone please
point me to the right direction?

   - If I have to write a plugin, which files I need to check?
   - Which part of the code currently executes the grouping feature? Does
   it happen in solr or lucene? Is it SearchHandler?

Parvesh Garg
http://www.zettata.com


Facet counts and RankQuery

2014-10-21 Thread Parvesh Garg
Hi All,

We have written a RankQuery plugin with a custom TopDocsCollector to
suppress documents below a certain threshold w.r.t. to the maxScore for
that query. It works fine and is reflected well with numFound and start
parameters.

Our problem lies with facet counts. Even though the docs numFound gives a
very less number, the facet counts are still coming from unsuppressed query
results.

E.g. in a test with a threshold of 20% , we reduced the totalDocs from
46030 to 6080 but the top facet count on a field is still 20500

The query parameter we are using looks like rq={!threshold value=0.2}

Is there a way propagate the suppression of results to FacetsComponent as
well? Can we send the same rq to FacetsComponent ?



Regards,
Parvesh Garg,

http://www.zettata.com


Re: Facet counts and RankQuery

2014-10-21 Thread Parvesh Garg
Hi Erick,

Thanks for the input. We have other requirements regarding precision and
recall, especially when other sorts are specified. So need to suppress docs
based on thresholds.



Parvesh Garg,
Founding Architect

http://www.zettata.com

On Tue, Oct 21, 2014 at 8:20 PM, Erick Erickson 
wrote:

> I _very strongly_ recommend that you do _not_ do this.
>
> First, the "problem" of having documents in the results
> list with, say, scores < 20% of the max takes care of itself;
> users stop paging pretty quickly. You're arbitrarily
> denying the users any chance of finding some documents
> that _do_ match their query. A user may know that a
> doc is in the corpus but be unable to find it. Very bad from
> a confidence-building standpoint.
>
> I've seen people put, say, 1-5 stars next to docs in the result
> to give the user some visual cue that they're getting into "less
> good" matches, but even that is of very limited value IMO. The
> stars represent quintiles, 5 stars for docs > 80% of max, 4
> stars between 60% and 80% etc.
>
> If you insist on this, then you'll need to run two passes
> across the data, the first will get the max score and the second
> will have a custom collector that somehow gets this number
> and rejects any docs below the threshold.
>
> Bet,
> Erick
>
> On Tue, Oct 21, 2014 at 3:09 AM, Parvesh Garg  wrote:
> > Hi All,
> >
> > We have written a RankQuery plugin with a custom TopDocsCollector to
> > suppress documents below a certain threshold w.r.t. to the maxScore for
> > that query. It works fine and is reflected well with numFound and start
> > parameters.
> >
> > Our problem lies with facet counts. Even though the docs numFound gives a
> > very less number, the facet counts are still coming from unsuppressed
> query
> > results.
> >
> > E.g. in a test with a threshold of 20% , we reduced the totalDocs from
> > 46030 to 6080 but the top facet count on a field is still 20500
> >
> > The query parameter we are using looks like rq={!threshold value=0.2}
> >
> > Is there a way propagate the suppression of results to FacetsComponent as
> > well? Can we send the same rq to FacetsComponent ?
> >
> >
> >
> > Regards,
> > Parvesh Garg,
> >
> > http://www.zettata.com
>


Re: Facet counts and RankQuery

2014-10-21 Thread Parvesh Garg
Hi Joel,

Thanks for the pointer. Can you point me to any example implementation.


Parvesh Garg,
Founding Architect

http://www.zettata.com

On Tue, Oct 21, 2014 at 9:32 PM, Joel Bernstein  wrote:

> The RankQuery cannot be used as filter. It is designed for custom
> ordering/ranking of results only. If it's used as filter the facet counts
> will not match up. If you need a filter collector then you need to use a
> PostFilter.
>
> Joel Bernstein
> Search Engineer at Heliosearch
>
> On Tue, Oct 21, 2014 at 10:50 AM, Erick Erickson 
> wrote:
>
> > I _very strongly_ recommend that you do _not_ do this.
> >
> > First, the "problem" of having documents in the results
> > list with, say, scores < 20% of the max takes care of itself;
> > users stop paging pretty quickly. You're arbitrarily
> > denying the users any chance of finding some documents
> > that _do_ match their query. A user may know that a
> > doc is in the corpus but be unable to find it. Very bad from
> > a confidence-building standpoint.
> >
> > I've seen people put, say, 1-5 stars next to docs in the result
> > to give the user some visual cue that they're getting into "less
> > good" matches, but even that is of very limited value IMO. The
> > stars represent quintiles, 5 stars for docs > 80% of max, 4
> > stars between 60% and 80% etc.
> >
> > If you insist on this, then you'll need to run two passes
> > across the data, the first will get the max score and the second
> > will have a custom collector that somehow gets this number
> > and rejects any docs below the threshold.
> >
> > Bet,
> > Erick
> >
> > On Tue, Oct 21, 2014 at 3:09 AM, Parvesh Garg 
> wrote:
> > > Hi All,
> > >
> > > We have written a RankQuery plugin with a custom TopDocsCollector to
> > > suppress documents below a certain threshold w.r.t. to the maxScore for
> > > that query. It works fine and is reflected well with numFound and start
> > > parameters.
> > >
> > > Our problem lies with facet counts. Even though the docs numFound
> gives a
> > > very less number, the facet counts are still coming from unsuppressed
> > query
> > > results.
> > >
> > > E.g. in a test with a threshold of 20% , we reduced the totalDocs from
> > > 46030 to 6080 but the top facet count on a field is still 20500
> > >
> > > The query parameter we are using looks like rq={!threshold value=0.2}
> > >
> > > Is there a way propagate the suppression of results to FacetsComponent
> as
> > > well? Can we send the same rq to FacetsComponent ?
> > >
> > >
> > >
> > > Regards,
> > > Parvesh Garg,
> > >
> > > http://www.zettata.com
> >
>