Hey Chris,

Sorry for the delay and thanks for your response. This was inspired by your
talk on boosting and biasing that you presented way back when at a meetup.
I'm glad that my general approach seems to make sense.

My approach was something like:
1) Look at the categories that the user has preferred and compute the
z-score
2) Pick the top 3 among those
3) Use those to boost search results.

I'll look at using the boosts as an exponent instead of a multiplier as I
think that would make sense.. also as it handles the 0 case.

This is for a prototype I am doing but I'll share the results one day in a
meetup as I think it'll be kinda interesting.

Thanks again
Amit


On Thu, Nov 14, 2013 at 11:11 AM, Chris Hostetter
<hossman_luc...@fucit.org>wrote:

>
> : I have a question around boosting. I wanted to use the &boost= to write a
> : nested query that will boost a document based on categorical preferences.
>
> You have no idea how stoked I am to see you working on this in a real
> world application.
>
> : Currently I have the weights set to the z-score equivalent of a user's
> : preference for that category which is simply how many standard deviations
> : above the global average is this user's preference for that movie
> category.
> :
> : My question though is basically whether or not semantically the equation
> : query(category:Drama)*<some weight> + query(category:Comedy)*<some
> weight>
> : + query(category:Action)*<some weight> makes sense?
>
> My gut says that your apprach makes sense -- but if i'm
> understadning you correclty, i think that you need to add "1" to
> all your weights: the "boost" is a multiplier, so if someone's rating for
> every category is is 0 std devs above the average rating (ie: the most
> average person imaginable), you don't wnat to give every moving in every
> category a score of 0.
>
> Are you picking the "top 3" categories the user prefers as a cut off, or
> are you arbitrarily using N category boosts for however many N categories
> the user is above the global average in their pref for that category?
>
> Are your prefrences coming from explicit user feedback on the categories
> (ie: "rate how much you like comedies on a scale of 1-5") or are you
> infering it from user ratings of the movies themselves? (ie: "rate this
> movie, which happens to be an scifi,action,comedy, on a scale of 1-5") ...
> because if it's hte later you probably want to be careful to also
> normalize based on how many categories the movie is in.
>
> the other thing to consider is wether you want to include "negative
> prefrences" (ie: weights less then 1) based on how many std dev the user's
> average is *below* the global average for a category .. in this case i
> *think* you'd want to divide the raw value from -1 to get a useful
> multiplier.
>
> Alternatively: you oculd experiment with using the weights as exponents
> instead of multipliers...
>
>
> b=sum(pow(query($cat1),1.482),pow(query($cat2),0.1199),pow(query($cat3),1.448))
>
> ...that would simplify the math you'd have to worry about both for the
> "totally boring average user" (x**0 = 1) and for the categories users hate
> (x**-5 = some positive fraction that will act as a penalty) ... but you'd
> definitley need to run some tests to see if it "over boosts" as the std
> dev variations get really high (might want to take a root first before
> using them as the exponent)
>
>
>
> -Hoss
>

Reply via email to