Re: SOLR and string comparison functions

Dariusz Wojtas Mon, 18 Sep 2017 15:55:27 -0700

Hi Emir,

I am calculating a "normalizzed" score, as it will be later used by
automatic decisioning processes to find if the result found "matches
enough". For example I might create rule to decide if found result score is
higher that 97% (matches), otherwise it is just a noise.
I've been thinking about the reranking query parser, but was not able to
create a real life working example, even something that would show the
concept on just 2 fields, then rerant the result.
I'd be happy to see such example.


I have found the answer for my original question, seems to work:
   <str name="q">{!func v=$global_search_function}</str>
   <str name="global_search_function">sum(
      product($firstName.weight, strdist(literal($firstName), firstName,
edit)),
      map($id.weight, 0.0001, 1000, product($id.weight,
strdist(literal($id), id, edit)), 0),
      map($fullName.weight, 0.0001, 1000, product($fullName.weight,
query($fullName_filter)), 0),
     )</str>
   <str name="fullName_filter">{!edismax qf=fullName pf=fullName ps=10
v=$fullName}</str>

Please see the fullName_filter definition and it's usage in the query()
above.

But now I am really worried about the performance, as there may be several
more filter fields that may affect the score.

Best regards,
Dariusz



On Tue, Sep 19, 2017 at 12:33 AM, Emir Arnautović <
emir.arnauto...@sematext.com> wrote:

> Hi Darius,
> This seems to me like misuse/misunderstanding of Solr. As you probably
> noticed, Solr score is not normalised - you cannot compare scores of two
> queries and tell if one result match better query than the other. There are
> some techniques to achieve something close, but that is not that straight
> forward and might depend on your case.
> In your case, you are trying to use function to query score, and depending
> on your index size, it might not perform well. You should probably be
> better with custom scorer.
> Back to your question: What do you try to achieve? When do you consider
> two names to match? Or you expect to calculate score for each document in
> the index and return top scored ones? Such solution will not scale.
> IMO, it would be the best if you rethink your requirement about score (or
> use reranking query parser https://cwiki.apache.org/
> confluence/display/solr/Query+Re-Ranking <https://cwiki.apache.org/
> confluence/display/solr/Query+Re-Ranking>) and set proper field analysis
> and edismax query parser.
> Otherwise good luck if you have a large index.
>
> Regards,
> Emir
>
> > On 19 Sep 2017, at 00:01, Dariusz Wojtas <dwoj...@gmail.com> wrote:
> >
> > Hi,
> > I am working on an application that searches for entries that may be
> > queried by multiple parameters.
> > These parameters may be sent to SOLR in different sets, each parameter
> with
> > it's own weight.
> >
> > Values for the example below might be as follows:
> > firstName=John&
> > firstName.weight=0.2&
> > id=Aw34563456WWA&
> > id.weight=0.5&
> > fullName=John Adreew Jr. Doe and Partners&
> > fullName.weight=0.3
> >
> >
> > There is one very important requirement.
> > No marther how many parameters are out there, the total result score
> cannot
> > exceed 1 (100%).
> > In every case I multiply param weight and result of string comparison.
> > A field may be used in comparison if it's weight is greater than 0 (in
> fact
> > greater than 0.0001).
> >
> >      <str name="q">{!func v=$global_search_function}</str>
> >      <str name="global_search_function">sum(
> >                product($firstName.weight, strdist(literal($firstName),
> > firstName, edit)),
> >                map($id.weight, 0.0001, 1000, product($id.weight,
> > strdist(literal($id), id, edit)), 0),
> >                map($fullName.weight, 0.0001, 1000,
> > product($fullName.weight, strdist(literal($fullName), fullName,
> ngram,10)),
> > 0),
> >                )</str>
> >
> > The question is about comparing fullName above.
> > What function should I use for comparison working on the fullName field
> the
> > same way as:
> >   "John Adreew Jr. Doe and Partners"~10^0.3
> > ?
> >
> > What are the functions that compare strings, other than strdist?
> > How do I create function similar to the "John Andrew ..." example above?
> >
> >
> > Best regards,
> > Dariusz Wojtas
>
>

Re: SOLR and string comparison functions

Reply via email to