Hi Alessandro,

I'm so happy there is someone who's done extensive work with QAC here! 

Right now, we measure nDCG via a Dynamic Bayesian Network. To break it down, 
we: 
- use a DBN model to generate a "score" for each query_url pair. 
- We then plug that score into a mathematical formula we found in a research 
paper (happy to share the paper if you're interested) for assigning labels 0-4. 
- We then cross-reference the scored & labeled query_url pairs with 1k of our 
system's top queries and 1k of our system's random queries. 
- We use that dataset as our ground truth. 
- We then query the system in real time each day for those 2k queries, label 
them, and compare those labels with our ground truth to get our system's nDCG. 

I hope that makes sense! Lots of steps __

Due to computational overhead reasons, we are pretty committed to using an 
external file & a separate Solr core for our suggestions. We are also planning 
to use the Suggester to add a little human nudge towards "successful" queries. 
I'm not sure whether that's what the Suggester is really meant to do, but we 
are not using it as a naïve prefix-matcher, but more of a query-suggestion 
tool. So, if we know that the query "blue pages" is less successful than the 
query "bluepages" (assuming we can identify the user's intent with this query), 
we will not show suggestions that match "blue pages," instead we will show 
suggestions that match "bluepages." Sort of like a query rewrite, except with 
fuzzy prefix matching, not the introduction of synonyms/expansions.

What we are concerned with currently is how to define a "successful" query. We 
have things like abandonment rate, dwell time, etc., but if you have any advice 
on more ways to identify successful queries, that'd be great. We want to stay 
away from defining success as "popularity," since that will just create a 
closed language system where people only query popular queries, and those 
queries stay popular only because people are querying them (assuming people 
click on the suggestions, of course).

Let me know your thoughts!

On 1/23/20, 10:45 AM, "Alessandro Benedetti" <a.benede...@sease.io> wrote:

    I have been working extensively on query autocompletion, these blogs should
    be helpful to you:
    
    
https://urldefense.proofpoint.com/v2/url?u=https-3A__sease.io_2015_07_solr-2Dyou-2Dcomplete-2Dme.html&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=0lExcWXK-kGTAfpnv-kU_LGminLzJjJKv6hYBFQG7iI&s=c149I_QBokd35FBMGaUxoBPMViUXAdZtVnkSKTINndE&e=
 
    
https://urldefense.proofpoint.com/v2/url?u=https-3A__sease.io_2018_06_apache-2Dlucene-2Dblendedinfixsuggester-2Dhow-2Dit-2Dworks-2Dbugs-2Dand-2Dimprovements.html&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=0lExcWXK-kGTAfpnv-kU_LGminLzJjJKv6hYBFQG7iI&s=m8s2XvI7tR1t9bNaA4SI-w90MdbLZTYxc0mBMz8RMSw&e=
 
    
    You idea of using search quality evaluation to drive the autocompletion is
    interesting.
    How do you currently calculate the NDCG for a query? What's your golden
    truth?
    Using that approach you will autocomplete favouring query completion that
    your search engine is able to process better, not necessarily closer to the
    user intent, still it could work.
    
    We should differentiate here between the suggester dictionary (where the
    suggestions come from, in your case it could be your extracted data) and
    the kind of suggestion (that in your case could be the free text suggester
    lookup)
    
    Cheers
    --------------------------
    Alessandro Benedetti
    Search Consultant, R&D Software Engineer, Director
    www.sease.io
    
    
    On Mon, 20 Jan 2020 at 17:02, David Hastings <hastings.recurs...@gmail.com>
    wrote:
    
    > Not a bad idea at all, however ive never used an external file before, 
just
    > a field in the index, so not an area im familiar with
    >
    > On Mon, Jan 20, 2020 at 11:55 AM Audrey Lorberfeld -
    > audrey.lorberf...@ibm.com <audrey.lorberf...@ibm.com> wrote:
    >
    > > David,
    > >
    > > Thank you, that is useful. So, would you recommend using a (clean) field
    > > over an external dictionary file? We have lots of "top queries" and
    > measure
    > > their nDCG. A thought was to programmatically generate an external file
    > > where the weight per query term (or phrase) == its nDCG. Bad idea?
    > >
    > > Best,
    > > Audrey
    > >
    > > On 1/20/20, 11:51 AM, "David Hastings" <hastings.recurs...@gmail.com>
    > > wrote:
    > >
    > >     Ive used this quite a bit, my biggest piece of advice is to choose a
    > > field
    > >     that you know is clean, with well defined terms/words, you dont want
    > an
    > >     autocomplete that has a massive dictionary, also it will make the
    > >     start/reload times pretty slow
    > >
    > >     On Mon, Jan 20, 2020 at 11:47 AM Audrey Lorberfeld -
    > >     audrey.lorberf...@ibm.com <audrey.lorberf...@ibm.com> wrote:
    > >
    > >     > Hi All,
    > >     >
    > >     > We plan to incorporate a query autocomplete functionality into our
    > > search
    > >     > engine (like this:
    > >
    > 
https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_solr_guide_8-5F1_suggester.html&d=DwIBaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=L8V-izaMW_v4j-1zvfiXSqm6aAoaRtk-VJXA6okBs_U&s=vnE9KGyF3jky9fSi22XUJEEbKLM1CA7mWAKrl2qhKC0&e=
    > >     > ). And I was wondering if anyone has personal experience with this
    > >     > component and would like to share? Basically, we are just looking
    > > for some
    > >     > best practices from more experienced Solr admins so that we have a
    > > starting
    > >     > place to launch this in our beta.
    > >     >
    > >     > Thank you!
    > >     >
    > >     > Best,
    > >     > Audrey
    > >     >
    > >
    > >
    > >
    >
    

Reply via email to