Re: Boosting documents by categorical preferences

Amit Nithian Mon, 27 Jan 2014 10:40:07 -0800

Hi Chris (and others interested in this),

Sorry for dropping off.. I got sidetracked with other work and came back to
this and finally got a V1 of this implemented.


The final process is as follows:
1) Pre-compute the global categorical num_ratings/average/std-dev (so for
Action the average rating may be 3.49 with stdDev of .99)
2) For a given user, retrieve the last X (X for me is 10) ratings and
compute the user's categorical affinities by taking the average rating for
all movies in that particular category (Action) subtract the global cat
average and divide by cat std_dev. Furthermore, multiply this by the
fraction of total user ratings in that category.
   -> For example, if a user's last 10 ratings consisted of 9/10 Drama and
1/10 Thriller, the z-score of the Thriller should be discounted relative to
that of the Drama so that it's more prominent the user's preference (either
positive or negative) to Drama.
3) Sort by the absolute value of the z-score (Thanks Hossman.. great
thought).
4) Return the top 3 (arbitrary number)
5) Modify the query to look like the following:

qq=tom hanks&q={!boost b=$b defType=edismax
v=$qq}&cat1=category:Children&cat2=category:Fantasy&cat3=category:Animation&b=sum(1,sum(product(query($cat1),0.22267872),product(query($cat2),0.21630952),product(query($cat3),0.21120241)))

basically b = 1+(pref1*query(category:something1) +
pref2*query(category:something2) + pref3*query(category:something3))

The initial results seem to be kinda promising... of course there are many
more optimizations I could do like decay user ratings over time to indicate
that preferences decay over time so a 5 rating a year ago doesn't count as
much as a 5 rating today.

Hope this helps others. I'll open source what I have soon and post back. If
there is feedback or other thoughts let me know!

Cheers
Amit


On Fri, Nov 22, 2013 at 11:38 AM, Chris Hostetter
<hossman_luc...@fucit.org>wrote:

>
> : I thought about that but my concern/question was how. If I used the pow
> : function then I'm still boosting the bad categories by a small
> : amount..alternatively I could multiply by a negative number but does that
> : work as expected?
>
> I'm not sure i understand your concern: negative powers would give you
> values less then 1, positive powers would give you values greater then 1,
> and then you'd use those values as multiplicitive boosts -- so the values
> less then 1 would penalize the scores of existing matching docs in the
> categories the user dislikes.
>
> Oh wait ... i see, in your original email (and in my subsequent suggested
> tweak to use pow()) you were talking about sum()ing up these 3 category
> boosts (and i cut/pasted sum() in my example as well) ... yeah,
> using multiplcation there would make more sense if you wanted to do the
> "negative prefrences" as well, because then then score of any matching doc
> will be reduced if it matches on an "undesired" category -- and the
> amount it will be reduced will be determined by how strongly it
> matches on that category (ie: the base score returned by the nested
> query() func) and "how negative" the undesired prefrence value (ie:
> the pow() exponent) is
>
>
> qq=...
> q={!boost b=$b v=$qq}
>
> b=prod(pow(query($cat1,cat1z)),pow(query($cat2,cat2z)),pow(query($cat3,cat3z))
> cat1=...action...
> cat1z=1.48
> cat2=...comedy...
> cat2z=1.33
> cat3=...kids...
> cat3z=-1.7
>
>
> -Hoss
>

Re: Boosting documents by categorical preferences

Reply via email to