Hey Chris, Sorry for the delay and thanks for your response. This was inspired by your talk on boosting and biasing that you presented way back when at a meetup. I'm glad that my general approach seems to make sense.
My approach was something like: 1) Look at the categories that the user has preferred and compute the z-score 2) Pick the top 3 among those 3) Use those to boost search results. I'll look at using the boosts as an exponent instead of a multiplier as I think that would make sense.. also as it handles the 0 case. This is for a prototype I am doing but I'll share the results one day in a meetup as I think it'll be kinda interesting. Thanks again Amit On Thu, Nov 14, 2013 at 11:11 AM, Chris Hostetter <hossman_luc...@fucit.org>wrote: > > : I have a question around boosting. I wanted to use the &boost= to write a > : nested query that will boost a document based on categorical preferences. > > You have no idea how stoked I am to see you working on this in a real > world application. > > : Currently I have the weights set to the z-score equivalent of a user's > : preference for that category which is simply how many standard deviations > : above the global average is this user's preference for that movie > category. > : > : My question though is basically whether or not semantically the equation > : query(category:Drama)*<some weight> + query(category:Comedy)*<some > weight> > : + query(category:Action)*<some weight> makes sense? > > My gut says that your apprach makes sense -- but if i'm > understadning you correclty, i think that you need to add "1" to > all your weights: the "boost" is a multiplier, so if someone's rating for > every category is is 0 std devs above the average rating (ie: the most > average person imaginable), you don't wnat to give every moving in every > category a score of 0. > > Are you picking the "top 3" categories the user prefers as a cut off, or > are you arbitrarily using N category boosts for however many N categories > the user is above the global average in their pref for that category? > > Are your prefrences coming from explicit user feedback on the categories > (ie: "rate how much you like comedies on a scale of 1-5") or are you > infering it from user ratings of the movies themselves? (ie: "rate this > movie, which happens to be an scifi,action,comedy, on a scale of 1-5") ... > because if it's hte later you probably want to be careful to also > normalize based on how many categories the movie is in. > > the other thing to consider is wether you want to include "negative > prefrences" (ie: weights less then 1) based on how many std dev the user's > average is *below* the global average for a category .. in this case i > *think* you'd want to divide the raw value from -1 to get a useful > multiplier. > > Alternatively: you oculd experiment with using the weights as exponents > instead of multipliers... > > > b=sum(pow(query($cat1),1.482),pow(query($cat2),0.1199),pow(query($cat3),1.448)) > > ...that would simplify the math you'd have to worry about both for the > "totally boring average user" (x**0 = 1) and for the categories users hate > (x**-5 = some positive fraction that will act as a penalty) ... but you'd > definitley need to run some tests to see if it "over boosts" as the std > dev variations get really high (might want to take a root first before > using them as the exponent) > > > > -Hoss >