Re: Re: Re: Re: Query Autocomplete Evaluation

Audrey Lorberfeld - audrey.lorberf...@ibm.com Thu, 27 Feb 2020 08:52:13 -0800

Paras,

Thank you for this response! Yes, you are being clear __


Regarding the assumptions you make for MRR, do you have any research papers to 
confirm that these user behaviors have been observed? I only ask because this 
paper http://yichang-cs.com/yahoo/sigir14_SearchAssist.pdf talks about how 
users often skip results and go straight to vanilla search even though their 
query is displayed in the top of the suggestions list (section 3.2 "QAC User 
Behavior Analysis"), among other behaviors that go against general IR 
intuition. This is only one paper, of course, but it seems that user research 
of QAC is hard to come by otherwise.

So acceptance rate = # of suggestions taken / total queries issued ?
And Selection to Display = # of suggestions taken (this would only be 1, if the 
not-taken suggestions are given 0s) / total suggestions displayed ?

If the above is true, wouldn't Selection to Display be binary? I.e. it's either 
1/# of suggestions displayed (assuming this is a constant) or 0?

Best,
Audrey


________________________________
From: Paras Lehana <paras.leh...@indiamart.com>
Sent: Thursday, February 27, 2020 2:58:25 AM
To: solr-user@lucene.apache.org
Subject: [EXTERNAL] Re: Re: Re: Query Autocomplete Evaluation

Hi Audrey,

For MRR, we assume that if a suggestion is selected, it's relevant. It's
also assumed that the user will always click the highest relevant
suggestion. Thus, we calculate position selection for each selection. If
still, I'm not understanding your question correctly, feel free to contact
me personally (hangouts?).

And @Paras, the third and fourth evaluation metrics you listed in your
> first reply seem the same to me. What is the difference between the two?


I was expecting you to ask this - I should have explained a bit more.
Acceptance Rate is the searches through Auto-Suggest for all Searches.
Whereas, value for Selection to Display is 1 if the Selection is made given
the suggestions were displayed otherwise 0. Here, the cases where results
are displayed is the universal set. Acceptance Rate is counted 0 even for
those searches where Selection was not made because there were no results
while S/D will not count this - it only counts cases where the result was
displayed.

Hope I'm clear. :)

On Tue, 25 Feb 2020 at 21:10, Audrey Lorberfeld - audrey.lorberf...@ibm.com
<audrey.lorberf...@ibm.com> wrote:

> This article
> https://urldefense.proofpoint.com/v2/url?u=http-3A__wwwconference.org_proceedings_www2011_proceedings_p107.pdf&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=KMeOCffgJOgN3RoE0ht8jssgdO3AbyNYqRmXlQ6xWRo&s=fVp2mKYimlchSj0RMKpd595S7C2nGxK2G3CQSkrycg4&e=
>   also
> indicates that MRR needs binary relevance labels, p. 114: "To this end, we
> selected a random sample of 198 (query, context) pairs from the set of
> 7,311 pairs, and manually tagged each of them as related (i.e., the query
> is related to the context; 60% of the pairs) and unrelated (40% of the
> pairs)."
>
> On 2/25/20, 10:25 AM, "Audrey Lorberfeld - audrey.lorberf...@ibm.com" <
> audrey.lorberf...@ibm.com> wrote:
>
>     Thank you, Walter & Paras!
>
>     So, from the MRR equation, I was under the impression the suggestions
> all needed a binary label (0,1) indicating relevance.* But it's great to
> know that you guys use proxies for relevance, such as clicks.
>
>     *The reason I think MRR has to have binary relevance labels is this
> Wikipedia article:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__en.wikipedia.org_wiki_Mean-5Freciprocal-5Frank&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=1f2LPzuBvibQd8m-8_HuNVYFm0JvCGyPDul6r4ATsLk&s=Sn7KV-BcFDTrmc1PfRVeSpB9Ysh3UrVIQKcB3G5zstw&e=
> , where it states below the formula that rank_i = "refers to the rank
> position of the first relevant document for the i-th query." If the
> suggestions are not labeled as relevant (0) or not relevant (1), then how
> do you compute the rank of the first RELEVANT document?
>
>     I'll check out these readings asap, thank you!
>
>     And @Paras, the third and fourth evaluation metrics you listed in your
> first reply seem the same to me. What is the difference between the two?
>
>     Best,
>     Audrey
>
>     On 2/25/20, 1:11 AM, "Walter Underwood" <wun...@wunderwood.org> wrote:
>
>         Here is a blog article with a worked example for MRR based on
> customer clicks.
>
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__observer.wunderwood.org_2016_09_12_measuring-2Dsearch-2Drelevance-2Dwith-2Dmrr_&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=e9a1kzjKu6l-P1g5agvpe-jQZfCF6bT4x6CeYDrUkgE&s=GzNrf4l_FjMqOkSx2B4_sCIGoJv2QYPbPqWplHGE3PI&e=
>
>         At my place of work, we compare the CTR and MRR of queries using
> suggestions to those that do not use suggestions. Solr autosuggest based on
> lexicon of book titles is highly effective for us.
>
>         wunder
>         Walter Underwood
>         wun...@wunderwood.org
>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__observer.wunderwood.org_&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=e9a1kzjKu6l-P1g5agvpe-jQZfCF6bT4x6CeYDrUkgE&s=L4yZqRG0pWGPpZ8U7S-feoiWSTrz_zBEq0FANYqncuE&e=
>  (my blog)
>
>         > On Feb 24, 2020, at 9:52 PM, Paras Lehana <
> paras.leh...@indiamart.com> wrote:
>         >
>         > Hey Audrey,
>         >
>         > I assume MRR is about the ranking of the intended suggestion.
> For this, no
>         > human judgement is required. We track position selection - the
> position
>         > (1-10) of the selected suggestion. For example, this is our
> recent numbers:
>         >
>         > Position 1 Selected (B3) 107,699
>         > Position 2 Selected (B4) 58,736
>         > Position 3 Selected (B5) 23,507
>         > Position 4 Selected (B6) 12,250
>         > Position 5 Selected (B7) 7,980
>         > Position 6 Selected (B8) 5,653
>         > Position 7 Selected (B9) 4,193
>         > Position 8 Selected (B10) 3,511
>         > Position 9 Selected (B11) 2,997
>         > Position 10 Selected (B12) 2,428
>         > *Total Selections (B13)* *228,954*
>         > MRR = (B3+B4/2+B5/3+B6/4+B7/5+B8/6+B9/7+B10/8+B11/9+B12/10)/B13
> = 66.45%
>         >
>         > Refer here for MRR calculation keeping Auto-Suggest in
> perspective:
>         >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__medium.com_-40dtunkelang_evaluating-2Dsearch-2Dmeasuring-2Dsearcher-2Dbehavior-2D5f8347619eb0&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=e9a1kzjKu6l-P1g5agvpe-jQZfCF6bT4x6CeYDrUkgE&s=WFv9xHoFHlnQmBgqIoHPi3moIiyttgAZJzWRxFLjyfk&e=
>         >
>         > "In practice, this is inverted to obtain the reciprocal rank,
> e.g., if the
>         > searcher clicks on the 4th result, the reciprocal rank is 0.25.
> The average
>         > of these reciprocal ranks is called the mean reciprocal rank
> (MRR)."
>         >
>         > nDCG may require human intervention. Please let me know in case
> I have not
>         > understood your question properly. :)
>         >
>         >
>         >
>         > On Mon, 24 Feb 2020 at 20:49, Audrey Lorberfeld -
> audrey.lorberf...@ibm.com
>         > <audrey.lorberf...@ibm.com> wrote:
>         >
>         >> Hi Paras,
>         >>
>         >> This is SO helpful, thank you. Quick question about your MRR
> metric -- do
>         >> you have binary human judgements for your suggestions? If no,
> how do you
>         >> label suggestions successful or not?
>         >>
>         >> Best,
>         >> Audrey
>         >>
>         >> On 2/24/20, 2:27 AM, "Paras Lehana" <paras.leh...@indiamart.com>
> wrote:
>         >>
>         >>    Hi Audrey,
>         >>
>         >>    I work for Auto-Suggest at IndiaMART. Although we don't use
> the
>         >> Suggester
>         >>    component, I think you need evaluation metrics for
> Auto-Suggest as a
>         >>    business product and not specifically for Solr Suggester
> which is the
>         >>    backend. We use edismax parser with EdgeNGrams Tokenization.
>         >>
>         >>    Every week, as the property owner, I report around 500
> metrics. I would
>         >>    like to mention a few of those:
>         >>
>         >>       1. MRR (Mean Reciprocal Rate): How high the user
> selection was
>         >> among the
>         >>       returned result. Ranges from 0 to 1, the higher the
> better.
>         >>       2. APL (Average Prefix Length): Prefix is the query by
> user. Lesser
>         >> the
>         >>       better. This reports how less an average user has to type
> for
>         >> getting the
>         >>       intended suggestion.
>         >>       3. Acceptance Rate or Selection: How many of the total
> searches are
>         >>       being served from Auto-Suggest. We are around 50%.
>         >>       4. Selection to Display Ratio: Did you make the user to
> click any
>         >> of the
>         >>       suggestions if they are displayed?
>         >>       5. Response Time: How fast are you serving your average
> query.
>         >>
>         >>
>         >>    The Selection and Response Time are our main KPIs. We track
> a lot about
>         >>    Auto-Suggest usage on our platform which becomes apparent if
> you
>         >> observe
>         >>    the URL after clicking a suggestion on dir.indiamart.com.
> However, not
>         >>    everything would benefit you. Do let me know for any related
> query or
>         >>    explanation. Hope this helps. :)
>         >>
>         >>    On Fri, 14 Feb 2020 at 21:23, Audrey Lorberfeld -
>         >> audrey.lorberf...@ibm.com
>         >>    <audrey.lorberf...@ibm.com> wrote:
>         >>
>         >>> Hi all,
>         >>>
>         >>> How do you all evaluate the success of your query autocomplete
> (i.e.
>         >>> suggester) component if you use it?
>         >>>
>         >>> We cannot use MRR for various reasons (I can go into them if
> you're
>         >>> interested), so we're thinking of using nDCG since we already
> use
>         >> that for
>         >>> relevance eval of our system as a whole. I am also interested
> in the
>         >> metric
>         >>> "success at top-k," but I can't find any research papers that
>         >> explicitly
>         >>> define "success" -- I am assuming it's a suggestion (or
> suggestions)
>         >>> labeled "relevant," but maybe it could also simply be the
> suggestion
>         >> that
>         >>> receives a click from the user?
>         >>>
>         >>> Would love to hear from the hive mind!
>         >>>
>         >>> Best,
>         >>> Audrey
>         >>>
>         >>> --
>         >>>
>         >>>
>         >>>
>         >>
>         >>    --
>         >>    --
>         >>    Regards,
>         >>
>         >>    *Paras Lehana* [65871]
>         >>    Development Engineer, *Auto-Suggest*,
>         >>    IndiaMART InterMESH Ltd,
>         >>
>         >>    11th Floor, Tower 2, Assotech Business Cresterra,
>         >>    Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305
>         >>
>         >>    Mob.: +91-9560911996
>         >>    Work: 0120-4056700 | Extn:
>         >>    *11096*
>         >>
>         >>    --
>         >>    *
>         >>    *
>         >>
>         >>     <
>         >>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.facebook.com_IndiaMART_videos_578196442936091_&d=DwIBaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=CTfu2EkiAFh-Ra4cn3EL2GdkKLBhD754dBAoRYpr2uc&s=kwWlK4TbSM6iPH6DBIrwg3QCeHrY-83N5hm2HtQQsjc&e=
>         >>>
>         >>
>         >>
>         >>
>         >
>         > --
>         > --
>         > Regards,
>         >
>         > *Paras Lehana* [65871]
>         > Development Engineer, *Auto-Suggest*,
>         > IndiaMART InterMESH Ltd,
>         >
>         > 11th Floor, Tower 2, Assotech Business Cresterra,
>         > Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305
>         >
>         > Mob.: +91-9560911996
>         > Work: 0120-4056700 | Extn:
>         > *11096*
>         >
>         > --
>         > *
>         > *
>         >
>         > <
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.facebook.com_IndiaMART_videos_578196442936091_&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=e9a1kzjKu6l-P1g5agvpe-jQZfCF6bT4x6CeYDrUkgE&s=uqfrTqQq6XBBa280nv82Eg7m2eGlEQZ7PaCrN5CgDkg&e=
> >
>
>
>
>
>
>

--
--
Regards,

*Paras Lehana* [65871]
Development Engineer, *Auto-Suggest*,
IndiaMART InterMESH Ltd,

11th Floor, Tower 2, Assotech Business Cresterra,
Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305

Mob.: +91-9560911996
Work: 0120-4056700 | Extn:
*1196*

--
*
*

 
<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.facebook.com_IndiaMART_videos_578196442936091_&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=KMeOCffgJOgN3RoE0ht8jssgdO3AbyNYqRmXlQ6xWRo&s=xI2wYaecdqmBiR-XspMGdbXUV4O4SvbiyuNZwApRVIA&e=
 >

Re: Re: Re: Re: Query Autocomplete Evaluation

Reply via email to