Hi Alessandro, I'm so happy there is someone who's done extensive work with QAC here!
Right now, we measure nDCG via a Dynamic Bayesian Network. To break it down, we: - use a DBN model to generate a "score" for each query_url pair. - We then plug that score into a mathematical formula we found in a research paper (happy to share the paper if you're interested) for assigning labels 0-4. - We then cross-reference the scored & labeled query_url pairs with 1k of our system's top queries and 1k of our system's random queries. - We use that dataset as our ground truth. - We then query the system in real time each day for those 2k queries, label them, and compare those labels with our ground truth to get our system's nDCG. I hope that makes sense! Lots of steps __ Due to computational overhead reasons, we are pretty committed to using an external file & a separate Solr core for our suggestions. We are also planning to use the Suggester to add a little human nudge towards "successful" queries. I'm not sure whether that's what the Suggester is really meant to do, but we are not using it as a naïve prefix-matcher, but more of a query-suggestion tool. So, if we know that the query "blue pages" is less successful than the query "bluepages" (assuming we can identify the user's intent with this query), we will not show suggestions that match "blue pages," instead we will show suggestions that match "bluepages." Sort of like a query rewrite, except with fuzzy prefix matching, not the introduction of synonyms/expansions. What we are concerned with currently is how to define a "successful" query. We have things like abandonment rate, dwell time, etc., but if you have any advice on more ways to identify successful queries, that'd be great. We want to stay away from defining success as "popularity," since that will just create a closed language system where people only query popular queries, and those queries stay popular only because people are querying them (assuming people click on the suggestions, of course). Let me know your thoughts! On 1/23/20, 10:45 AM, "Alessandro Benedetti" <a.benede...@sease.io> wrote: I have been working extensively on query autocompletion, these blogs should be helpful to you: https://urldefense.proofpoint.com/v2/url?u=https-3A__sease.io_2015_07_solr-2Dyou-2Dcomplete-2Dme.html&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=0lExcWXK-kGTAfpnv-kU_LGminLzJjJKv6hYBFQG7iI&s=c149I_QBokd35FBMGaUxoBPMViUXAdZtVnkSKTINndE&e= https://urldefense.proofpoint.com/v2/url?u=https-3A__sease.io_2018_06_apache-2Dlucene-2Dblendedinfixsuggester-2Dhow-2Dit-2Dworks-2Dbugs-2Dand-2Dimprovements.html&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=0lExcWXK-kGTAfpnv-kU_LGminLzJjJKv6hYBFQG7iI&s=m8s2XvI7tR1t9bNaA4SI-w90MdbLZTYxc0mBMz8RMSw&e= You idea of using search quality evaluation to drive the autocompletion is interesting. How do you currently calculate the NDCG for a query? What's your golden truth? Using that approach you will autocomplete favouring query completion that your search engine is able to process better, not necessarily closer to the user intent, still it could work. We should differentiate here between the suggester dictionary (where the suggestions come from, in your case it could be your extracted data) and the kind of suggestion (that in your case could be the free text suggester lookup) Cheers -------------------------- Alessandro Benedetti Search Consultant, R&D Software Engineer, Director www.sease.io On Mon, 20 Jan 2020 at 17:02, David Hastings <hastings.recurs...@gmail.com> wrote: > Not a bad idea at all, however ive never used an external file before, just > a field in the index, so not an area im familiar with > > On Mon, Jan 20, 2020 at 11:55 AM Audrey Lorberfeld - > audrey.lorberf...@ibm.com <audrey.lorberf...@ibm.com> wrote: > > > David, > > > > Thank you, that is useful. So, would you recommend using a (clean) field > > over an external dictionary file? We have lots of "top queries" and > measure > > their nDCG. A thought was to programmatically generate an external file > > where the weight per query term (or phrase) == its nDCG. Bad idea? > > > > Best, > > Audrey > > > > On 1/20/20, 11:51 AM, "David Hastings" <hastings.recurs...@gmail.com> > > wrote: > > > > Ive used this quite a bit, my biggest piece of advice is to choose a > > field > > that you know is clean, with well defined terms/words, you dont want > an > > autocomplete that has a massive dictionary, also it will make the > > start/reload times pretty slow > > > > On Mon, Jan 20, 2020 at 11:47 AM Audrey Lorberfeld - > > audrey.lorberf...@ibm.com <audrey.lorberf...@ibm.com> wrote: > > > > > Hi All, > > > > > > We plan to incorporate a query autocomplete functionality into our > > search > > > engine (like this: > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_solr_guide_8-5F1_suggester.html&d=DwIBaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=L8V-izaMW_v4j-1zvfiXSqm6aAoaRtk-VJXA6okBs_U&s=vnE9KGyF3jky9fSi22XUJEEbKLM1CA7mWAKrl2qhKC0&e= > > > ). And I was wondering if anyone has personal experience with this > > > component and would like to share? Basically, we are just looking > > for some > > > best practices from more experienced Solr admins so that we have a > > starting > > > place to launch this in our beta. > > > > > > Thank you! > > > > > > Best, > > > Audrey > > > > > > > > > >