Re: CPU Intensive Scoring Alternatives

Doug Turnbull Tue, 21 Feb 2017 10:43:37 -0800

With that many documents, why not start with an AND search and reissue an
OR query if there's no results? My strategy is to prefer an AND for large
collections (or a higher mm than 1) and prefer closer to an OR for smaller
collections.


-Doug

On Tue, Feb 21, 2017 at 1:39 PM Fuad Efendi <f...@efendi.ca> wrote:

> Thank you Ahmet, I will try it; sounds reasonable
>
>
> From: Ahmet Arslan <iori...@yahoo.com.invalid> <iori...@yahoo.com.invalid>
> Reply: solr-user@lucene.apache.org <solr-user@lucene.apache.org>
> <solr-user@lucene.apache.org>, Ahmet Arslan <iori...@yahoo.com>
> <iori...@yahoo.com>
> Date: February 21, 2017 at 3:02:11 AM
> To: solr-user@lucene.apache.org <solr-user@lucene.apache.org>
> <solr-user@lucene.apache.org>
> Subject:  Re: CPU Intensive Scoring Alternatives
>
> Hi,
>
> New default similarity is BM25.
> May be explicitly set similarity to tf-idf and see how it goes?
>
> Ahmet
>
>
> On Tuesday, February 21, 2017 4:28 AM, Fuad Efendi <f...@efendi.ca> wrote:
> Hello,
>
>
> Default TF-IDF performs poorly with the indexed 200 millions documents.
> Query "Michael Jackson" may run 300ms, and "Michael The Jackson" over 3
> seconds. eDisMax. Because default operator "OR" and stopword "The" we have
> 50-70 millions documents as a query result, and scoring is CPU intensive.
> What to do? Our typical queries return over million documents, and response
> times of simple queries ranges from 50 milliseconds to 5-10 seconds
> depending on result set.
>
> This was just an exaggerated example with stopword “the”, but even simplest
> query “Michael Jackson” runs 300ms instead of 3ms just because huge number
> of hits and TF-IDF calculations. Solr 6.3.
>
>
> Thanks,
>
> --
>
> Fuad Efendi
>
> (416) 993-2060
>
> http://www.tokenizer.ca
> Search Relevancy, Recommender Systems
>

Re: CPU Intensive Scoring Alternatives

Reply via email to