BM25 came out of work on probabilistic engines, but using BM25 in Solr doesn’t
automatically make it probabilistic.
I read a paper once that showed the two models are not that different, maybe by
Karen Sparck-Jones.
Still, even with a probabilistic model, relevance cutoffs don’t work. It is
st
Hi Koji,
strictly talking about TF-IDF ( and BM25 which is an evolution of that
approach) I would say it is a weighting function/numerical statistic that
can be used for ranking functions and is based on probabilistic concepts (
such as IDF) but it is not a probabilistic function[1].
Indeed a BM25
Hi Walter,
May I ask a tangential question? I'm curious the following line you wrote:
> Solr is a vector-space engine. Some early engines (Verity VDK) were probabilistic engines. Those
do give an absolute estimate of the relevance of each hit. Unfortunately, the relevance of results
is just no
Thank you! That worked.
From: Ahmet Arslan
Date: Wednesday, April 12, 2017 at 3:15 PM
To: "solr-user@lucene.apache.org" , David Kramer
Subject: Re: Filtering results by minimum relevancy score
Hi,
I cannot find it. However it should be something like
q=hello&fq={!frange
David I think it can be done, but a score has no real *meaning* to your
domain other than the one you engineer into it. There's no 1-100 scale that
guarantees at 100 that your users will love the results.
Solr isn't really a turn key solution. It requires you to understand more
deeply what relevan
Hi,
I cannot find it. However it should be something like
q=hello&fq={!frange l=0.5}query($q)
Ahmet
On Wednesday, April 12, 2017, 10:07:54 PM GMT+3, Ahmet Arslan
wrote:
Hi David,
A function query named "query" returns the score for the given subquery.
Combined with frange query parser this is
Fine. It can’t be done. If it was easy, Solr/Lucene would already have the
feature, right?
Solr is a vector-space engine. Some early engines (Verity VDK) were
probabilistic engines. Those do give an absolute estimate of the relevance of
each hit. Unfortunately, the relevance of results is just
Hi David,
A function query named "query" returns the score for the given subquery.
Combined with frange query parser this is possible. I tried it in the past.I am
searching the original post. I think it was Yonik's post.
https://cwiki.apache.org/confluence/display/solr/Function+Queries
Ahmet
The idea is to not return poorly matching results, not to limit the number of
results returned. One query may have hundreds of excellent matches and another
query may have 7. So cutting off by the number of results is trivial but not
useful.
Again, we are not doing this for performance reasons
Well, just because ES has it doesn't mean it's A Good Thing. IMO, it's
just a "feel good" kind of thing for people who don't really
understand scoring.
>From that page: "Note, most times, this does not make much sense, but
is provided for advanced use cases."
I've written enough weasel-worded cav
On 4/10/2017 8:59 AM, David Kramer wrote:
> I’ve done quite a bit of searching on this. Pretty much every page I
> find says it’s a bad idea and won’t work well, but I’ve been asked to
> at least try it to reduce the number of completely unrelated results
> returned. We are not trying to normalize
@alessandro
Elastic-search has it:
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-min-score.html
On Wed, Apr 12, 2017 at 1:49 PM, alessandro.benedetti
wrote:
> I am not completely sure that the potential benefit of merging less docs in
> sharded pagination overcom
I am not completely sure that the potential benefit of merging less docs in
sharded pagination overcomes the additional time needed to apply the
filtering function query.
I would need to investigate more in details the frange internals.
Cheers
-
---
Alessandro Benedetti
Search C
Can't the filter be used in cases when you're paginating in
sharded-scenario ?
So if you do limit=10, offset=10, each shard will return 20 docs ?
While if you do limit=10, _score<=last_page.min_score, then each shard will
return 10 docs ? (they will still score all docs, but merging will be
faster)
Can i ask what is the final requirement here ?
What are you trying to do ?
- just display less results ?
you can easily do at search client time, cutting after a certain amount
- make search faster returning less results ?
This is not going to work, as you need to score all of them as Erick
expla
Here we go
https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-FunctionRangeQueryParser
.
But it's %100 YAGNI. You'd better tweak search to be more precise.
On Mon, Apr 10, 2017 at 7:12 PM, Ahmet Arslan
wrote:
> Hi,
> I remember that this is possible via frange query pars
Hi,
I remember that this is possible via frange query parser.But I don't have the
query string at hand.
Ahmet
On Monday, April 10, 2017, 9:00:09 PM GMT+3, David Kramer
wrote:
I’ve done quite a bit of searching on this. Pretty much every page I find says
it’s a bad idea and won’t work well, but
Well, that's rather the point, the low-scoring docs aren't unrelated,
someone just thinks they are.
Flippancy aside, the score is, as you've researched, a bad gauge.
Since Lucene has to compute the score of a doc before it knows the
score, at any point in the collection process you may get a doc t
Good luck, and let us know what the results are. About dropping the cache.. That
shouldn't be a problem, it should just be computed when your
component is called
the first time, so starting the server (or opening a new searcher)
should re-compute
it. Your filters shouldn't be very big, just maxDoc
On Thu, Aug 18, 2011 at 02:32:48PM -0400, Erick Erickson wrote:
> Hmmm, I'm still not getting it...
>
> You have one or more lists. These lists change once a month or so. Are
> you trying
> to include or exclude the documents in these lists?
In our specific case to include *only* the documents ha
Hmmm, I'm still not getting it...
You have one or more lists. These lists change once a month or so. Are
you trying
to include or exclude the documents in these lists? And do the authors you want
to include or exclude change on a per-query basis or would you be all set if you
just had a filter tha
On Thu, Aug 18, 2011 at 08:36:08AM -0400, Erick Erickson wrote:
> How does this list of authors get selected? The reason I'm asking is
> I'm wondering
> if you can "define the problem away". In other words, I'm wondering if this
> is an XY problem (http://people.apache.org/~hossman/#xyproblem).
:-
How does this list of authors get selected? The reason I'm asking is
I'm wondering
if you can "define the problem away". In other words, I'm wondering if this
is an XY problem (http://people.apache.org/~hossman/#xyproblem).
I can't imagine you expect a user to specify up to 2k authors, so there mu
On Tue, Aug 16, 2011 at 07:56:51AM +, tomas.zer...@axelspringer.de wrote:
> Hello, Solrs
>
> we are trying to filter out documents written by (one or more of) the authors
> from
> a mediumish list (~2K). The document set itself is in the millions.
[...]
Sorry. Forgot to say that we are usin
> As part of solr results i am able to get the max score.If i
> want to filter
> the results based on the max score, let say the max
> score is 10 And i need
> only the results between max score to 50 % of max
> score.This max score is
> going to change dynamically.How can we implement this?Do we
> Hi
> I want to add a filter to my query which takes documents
> whose "city"
> field has either Bangalore of cochin or Bombay. how do i do
> this?
>
> fq=city:bangalore&fq=city:bombay& fq=city:cochin
> will take the
> intersection. I need the union.
fq=city:(bangalore OR cochin OR bombay)
sam
t; To: solr-user@lucene.apache.org
>> Sent: Thursday, September 18, 2008 7:35:43 PM
>> Subject: Re: Filtering results
>>
>> Otis,
>>
>> Would be reasonable to run a query like this
>>
>> http://localhost:8280/solr/select/?q=terms_x&version=2.2&
-- Lucene - Solr - Nutch
- Original Message
> From: ristretto.rb <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Thursday, September 18, 2008 7:35:43 PM
> Subject: Re: Filtering results
>
> Otis,
>
> Would be reasonable to run a query like thi
l requirements, but running
>> 1+10 queries doesn't sound good to me from scalability/performance point of
>> view.
>>
>> Otis
>> --
>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>>
>>
>>
>> - Original Mess
1+10 queries doesn't sound good to me from scalability/performance point of
> view.
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> - Original Message
>> From: ristretto.rb <[EMAIL PROTECTED]>
>> To: solr-user
-
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
> From: ristretto.rb <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Tuesday, September 16, 2008 6:45:02 PM
> Subject: Re: Filtering results
>
> thanks. very interesting
thanks. very interesting. The plot thickens. And, yes, I think
field collapsing is exactly what I'm after.
I'm am considering now trying this patch. I have a solr 1.2 instance
on Jetty. I looks like I need to install the patch.
Does anyone use that patch? Recommend it? The wiki page
(http:/
: 1. Identify all records that would match search terms. (Suppose I
: search for 'dog', and get 450,000 matches)
: 2. Of those records, find the distinct list of groups over all the
: matches. (Suppose there are 300.)
: 3. Now get the top ranked record from each group, as if you search
: just
Thanks for the reply Erik
Sorry for being vague. To be clear we have 1-2 million records, and
rough 12000-14000 groups.
Each record is in one and only one group.
I see it working something like this
1. Identify all records that would match search terms. (Suppose I
search for 'dog', and get 45
Personally, I'd send three requests for solr, one for each group.
&rows=1&fq=category:A ... and so on.
But that'd depend on how many groups you have.
One can always hack custom request handlers to do this sort of thing
all as a single request, but I'd guess it ain't that much slower to
ju
35 matches
Mail list logo