: I was talking about boosting documents using past popularity. So a user : searches for X and gets 10 results. This view is recorded for each of the 10 : documents and added to the index later. If a user clicks on result #2, the : click is recorded for doc #2 and added to index. We boost using clicks/view.
FWIW: I've observed three problems with this type of metric... 1) "render" vs "view" ... what you are calling a "view" is really a "rendering" -- you are sending the data back to include the item in the list of 10 items on the page, and the brwoser is rendering it, but that doesn't mean the users is actaully "viewing" it -- particularly in a webpage type situation where only the first 3-5 results might actually appear "above the fold" and the user has to scroll to see the rest. Even in a smaller UI element (like a left or right nav info box, there's no garuntee that the user acctually "views" any of the items, which can bias things. 2) It doesn't take into account people who click on a result, decide it's terrible, hit the back arrow and click on a differnet result -- both of those wind up scoring "equally". Some really complex session+click analysis can overcome this, but not a lot of people have the resources to do that all the time. 3) ignoring #1 and #2 above (because i havne't found many better options) you face the popularity problem -- or what my coworkers and i use to call the "TRL Problem" back in the 90s: MTV's Total Request Live was a Top X countdown show of videos, featuring hte most popular videos of the week based on requests -- but it was also the number one show on the network, occupying something like 4/24 broadcast hours of every day, when there was only a total of 6/24 hours that actaully showed music videoes. So for them ost part the only videos peopel ever saw were on TRL, so those were the only videos that ever got requested. In a nutshell: once something becomes "popular" and is what everybody sees, it stays popular, because it's what everybody sees and they don't know that there is better stuff out there. Even if everyone looks at the full list of results and actaully reads all of the first 10 summaries, in the absense of ay other bias their inclination is going to be to assume #1 is the best. So they might click on that even if another result on the list appears better bassed on their opinion. A variation that i did some experiments with, but never really refined because i didn't have the time/energy to really go to town on it, is to weight the "clicks" based on position: a click on item #1 whould't be worth anything -- it's hte number one result, the expectation is that it better get clicked or something is wrong. A click on #2 is worth soemthing to that item, and a click on #3 is worth more to that item, and so on ... so that if the #9 item gets a click, that's huge. To do it right, I think what you really want to do is penalize items that get views but no clicks -- because if someone loads up resuolts 1-10, and doesn't click on any of them, that should be a vote in favor of moving all of them "down" and moving item #11 up (even though it got no views or clicks) But like i said: i never experimented with this idea enough to come up with a good formula, or verify that the idea was sound. -Hoss