Hello Doug

I have just watched the quepid demonstration video, and I strongly agree
with your introduction: it is very hard to involve marketing/business
people in repeated testing session, and speadsheets or other kind of files
are not the right tool to use.
Currenlty I'm quite alone in my tuning task and having a visual approach
could be benefical for me, you are giving me many good inputs!

I see that kelvin (my scripted tool) and queepid follows the same path. In
queepid someone quickly whatches the results and applies colours to result,
in kelvin you enter one on more queries (network cable, ethernet cable) and
states that the result must contains ethernet in the title, or must come
from a list of product categories.

I also do diffs of results, before and after changes, to check what is
going on; but I have to do that in a very unix-scripted way.

Have you considered of placing a counter of total red/bad results in
quepid? I use this index to have a quick overview of changes impact across
all queries. Actually I repeat tests in production from times to time, and
if I see the "kelvin temperature" rising (the number of errors going up) I
know I have to check what's going on because new products maybe are having
a bad impact on the index.

I also keep counters of products with low quality images/no images at all
or too short listings, sometimes are useful to undestand better what will
happen if you change some bq/fq in the application.

I see also that after changes in quepid someone have to check "gray"
results and assign them a colour, in kelvin case sometimes the conditions
can do a bit of magic (new product names still contains SM-G900F) but
sometimes can introduce false errors (the new product name contains only
Galaxy 5 and not the product code SM-G900F). So some checks are needed but
with quepid everybody can do the check, with kelvin you have to change some
line of a script, and not everybody is able/willing to do that.

The idea of a static index is a good suggestion, I will try to have it in
the next round of search engine improvement.

Thank you Doug!




2014-04-09 17:48 GMT+02:00 Doug Turnbull <
dturnb...@opensourceconnections.com>:

> Hey Giovanni, nice to meet you.
>
> I'm the person that did the Test Driven Relevancy talk. We've got a product
> Quepid (http://quepid.com) that lets you gather good/bad results for
> queries and do a sort of test driven development against search relevancy.
> Sounds similar to your existing scripted approach. Have you considered
> keeping a static catalog for testing purposes? We had a project with a lot
> of updates and date-dependent relevancy. This lets you create some test
> scenarios against a static data set. However, one downside is you can't
> recreate problems in production in your test setup exactly-- you have to
> find a similar issue that reflects what you're seeing.
>
> Cheers,
> -Doug
>
>
> On Wed, Apr 9, 2014 at 10:42 AM, Giovanni Bricconi <
> giovanni.bricc...@banzai.it> wrote:
>
> > Thank you for the links.
> >
> > The book is really useful, I will definitively have to spend some time
> > reformatting the logs to to access number of result founds, session id
> and
> > much more.
> >
> > I'm also quite happy that my test cases produces similar results to the
> > precision reports shown at the beginning of the book.
> >
> > Giovanni
> >
> >
> > 2014-04-09 12:59 GMT+02:00 Ahmet Arslan <iori...@yahoo.com>:
> >
> > > Hi Giovanni,
> > >
> > > Here are some relevant pointers :
> > >
> > >
> > >
> >
> http://www.lucenerevolution.org/2013/Test-Driven-Relevancy-How-to-Work-with-Content-Experts-to-Optimize-and-Maintain-Search-Relevancy
> > >
> > >
> > > http://rosenfeldmedia.com/books/search-analytics/
> > >
> > > http://www.sematext.com/search-analytics/index.html
> > >
> > >
> > > Ahmet
> > >
> > >
> > > On Wednesday, April 9, 2014 12:17 PM, Giovanni Bricconi <
> > > giovanni.bricc...@banzai.it> wrote:
> > > It is about one year I'm working on an e-commerce site, and
> > unfortunately I
> > > have no "information retrieval" background, so probably I am missing
> some
> > > important practices about relevance tuning and search engines.
> > > During this period I had to fix many "bugs" about bad search results,
> > which
> > > I have solved sometimes tuning edismax weights, sometimes creating ad
> hoc
> > > query filters or query boosting; but I am still not able to figure out
> > what
> > > should be the correct process to improve search results relevance.
> > >
> > > These are the practices I am following, I would really appreciate any
> > > comments about them and any hints about what practices you follow in
> your
> > > projects:
> > >
> > > - In order to have a measure of search quality I have written many test
> > > cases such as "if the user searches for <<nike sport watch>> the search
> > > result should display at least four <<tom tom>> products with the words
> > > <<nike>> and <<sportwatch>> in the title". I have written a tool that
> > read
> > > such tests from json files and applies them to my applications, and
> then
> > > counts the number of results that does not match the criterias stated
> in
> > > the test cases. (for those interested this tool is available at
> > > https://github.com/gibri/kelvin but it is still quite a prototype)
> > >
> > > - I use this count as a quality index, I tried various times to change
> > the
> > > edismax weight to lower the whole number of error, or to add new
> > > filters/boostings to the application to try to decrease the error
> count.
> > >
> > > - The pros of this is that at least you have a number to look at, and
> > that
> > > you have a quick way of checking the impact of a modification.
> > >
> > > - The bad side is that you have to maintain the test cases: now I have
> > > about 800 tests and my product catalogue changes often, this implies
> that
> > > some products exits the catalog, and some test cases cant pass anymore.
> > >
> > > - I am populating the test cases using errors reported from users, and
> I
> > > feel that this is driving the test cases too much toward pathologic
> > cases.
> > > An more over I haven't many test for cases that are working well now.
> > >
> > > I would like to use search logs as drivers to generate tests, but I
> feel
> > I
> > > haven't picked the right path. Using top queries, manually reviewing
> > > results, and then writing tests is a slow process; moreover many top
> > > queries are ambiguous or are driven by site ads.
> > >
> > > Many many queries are unique per users. How to deal with these cases?
> > >
> > > How are you using your log to find out test cases to fix? Are you
> looking
> > > for queries where the user is not "opening" any returned results? Which
> > kpi
> > > have you chosen to find out query that are not providing good results?
> > And
> > > what are you using as kpi for the whole search, beside the conversion
> > rate?
> > >
> > > Can you suggest me any other practices you are using on your projects?
> > >
> > > Thank you very much in advance
> > >
> > > Giovanni
> > >
> > >
> >
>
>
>
> --
> Doug Turnbull
> Search & Big Data Architect
> OpenSource Connections <http://o19s.com>
>

Reply via email to