: I was talking about boosting documents using past popularity. So a user
: searches for X and gets 10 results. This view is recorded for each of the 10
: documents and added to the index later. If a user clicks on result #2, the
: click is recorded for doc #2 and added to index. We boost using clicks/view.

FWIW: I've observed three problems with this type of metric...

1) "render" vs "view" ... what you are calling a "view" is really a 
"rendering" -- you are sending the data back to include the item in the 
list of 10 items on the page, and the brwoser is rendering it, but that 
doesn't mean the users is actaully "viewing" it -- particularly in a 
webpage type situation where only the first 3-5 results might actually 
appear "above the fold" and the user has to scroll to see the rest.  Even 
in a smaller UI element (like a left or right nav info box, there's no 
garuntee that the user acctually "views" any of the items, which can bias 
things.

2) It doesn't take into account people who click on a result, decide it's 
terrible, hit the back arrow and click on a differnet result -- both of 
those wind up scoring "equally".  Some really complex session+click 
analysis can overcome this, but not a lot of people have the resources to 
do that all the time.

3) ignoring #1 and #2 above (because i havne't found many better options) 
you face the popularity problem -- or what my coworkers and i use to call 
the "TRL Problem" back in the 90s:  MTV's Total Request Live was a Top X 
countdown show of videos, featuring hte most popular videos of the week 
based on requests -- but it was also the number one show on the network, 
occupying something like 4/24 broadcast hours of every day, when there was 
only a total of 6/24 hours that actaully showed music videoes.  So for 
them ost part the only videos peopel ever saw were on TRL, so those were 
the only videos that ever got requested.

In a nutshell: once something becomes "popular" and is what everybody 
sees, it stays popular, because it's what everybody sees and they don't 
know that there is better stuff out there.

Even if everyone looks at the full list of results and actaully reads all 
of the first 10 summaries, in the absense of ay other bias their 
inclination is going to be to assume #1 is the best.  So they might click 
on that even if another result on the list appears better bassed on their 
opinion.

A variation that i did some experiments with, but never really refined 
because i didn't have the time/energy to really go to town on it, is to 
weight the "clicks" based on position:  a click on item #1 whould't be 
worth anything -- it's hte number one result, the expectation is that it 
better get clicked or something is wrong.  A click on #2 is worth 
soemthing to that item, and a click on #3 is worth more to that item, and 
so on ... so that if the #9 item gets a click, that's huge.  To do it 
right, I think what you really want to do is penalize items that get views 
but no clicks -- because if someone loads up resuolts 1-10, and doesn't 
click on any of them, that should be a vote in favor of moving all of them 
"down" and moving item #11 up (even though it got no views or clicks)

But like i said: i never experimented with this idea enough to come up 
with a good formula, or verify that the idea was sound.

-Hoss

Reply via email to