Re: Boosts for relevancy (shopping products)

Alessandro Benedetti Mon, 21 Mar 2016 03:58:24 -0700

Mmm maybe I didn't explain properly, all the fields you have in the index
for the products could be used to design features .
Of course my list was an example, but when processing clicks you should
first take in consideration all the features you can extract that should
affect your ranking algorithm.
If you can give a glimpse of your schema, we can help in giving you some
draft feature :)


Cheers

On Fri, Mar 18, 2016 at 4:57 PM, Robert Brown <r...@intelcompute.com> wrote:

> Thanks, would be a great idea but unfortunately we don't have that sort of
> granularity of features.
>
> Can definitely use the category of clicked products though, sounds like a
> good enough start.
>
>
>
>
>
> On 03/18/2016 04:36 PM, Alessandro Benedetti wrote:
>
>> Actually if you are able to collect past ( or future signals) like clicks
>> or purchase, i would rather focus on the features of your products rather
>> than the products themselves.
>> What will happen is that you are going to be able rank in a better way
>> products based on how their feature should affect the score.
>> i.e.
>> after you trained your model you realize that people searching for
>> computer
>> gadgets are more likely to click and buy :
>> specific brands - apple compatible - low energy consumption - high user
>> rating  ect ect products
>>
>> At this point even new products that will arrive, which have that set of
>> features, are going to be boosted.
>> Even if you haven't seen them at all.
>>
>> Cheers
>>
>> On Fri, Mar 18, 2016 at 4:21 PM, Robert Brown <r...@intelcompute.com>
>> wrote:
>>
>> It's also worth mentioning that our platform contains shopping products in
>>> every single category, and will be searched by absolutely anyone, via an
>>> API made available to various websites, some niche, some not.
>>>
>>> If those websites are category specific, ie, electrical goods, then we
>>> could boost on certain categories for a given website, but if they're
>>> also
>>> broad, is this even possible?
>>>
>>> I guess we could track individual users and build up search-histories to
>>> try and guide us, but I don't see many hits being made on repeat users.
>>>
>>> Recording clicks on products could also be used to boost individual
>>> products for specific keywords - I'm beginning to think this is actually
>>> our best hope?  e.g.  A multi-valued field containing keywords that
>>> resulted in a click on that product.
>>>
>>>
>>>
>>>
>>>
>>> On 03/18/2016 04:14 PM, Robert Brown wrote:
>>>
>>> That does sound rather useful!
>>>>
>>>> We currently have it set to 0.1
>>>>
>>>>
>>>>
>>>> On 03/18/2016 04:13 PM, Nick Vasilyev wrote:
>>>>
>>>> Tie does quite a bit, without it only the highest weighted field that
>>>>> has
>>>>> the term will be included in relevance score. Tie let's you include the
>>>>> other fields that match as well.
>>>>> On Mar 18, 2016 10:40 AM, "Robert Brown" <r...@intelcompute.com> wrote:
>>>>>
>>>>> Thanks for the added input.
>>>>>
>>>>>> I'll certainly look into the machine learning aspect, will be good to
>>>>>> put
>>>>>> some basic knowledge I have into practice.
>>>>>>
>>>>>> I'd been led to believe the tie parameter didn't actually do a lot.
>>>>>> :-/
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 03/18/2016 12:07 PM, Nick Vasilyev wrote:
>>>>>>
>>>>>> I work with a similar catalog; except our data is especially bad.
>>>>>> We've
>>>>>>
>>>>>>> found that several things helped:
>>>>>>>
>>>>>>> - Item level grouping (group same item sold by multiple vendors).
>>>>>>> Rank
>>>>>>> items with more vendors a bit higher.
>>>>>>> - Include a boost function for other attributes, such as an original
>>>>>>> image
>>>>>>> of the product
>>>>>>> - Rank items a bit higher if they have data from an external catalog
>>>>>>> like
>>>>>>> IceCat
>>>>>>> - For relevance and performance, we have several fields that we copy
>>>>>>> data
>>>>>>> into. High value fields get copied into a high weighted field, while
>>>>>>> lower
>>>>>>> value fields like description get copied into a lower weighted field.
>>>>>>> These
>>>>>>> fields are the backbone of our qf parameter, with other fields adding
>>>>>>> additional boost.
>>>>>>> - Play around with the tie parameter for edismax, we found that it
>>>>>>> makes
>>>>>>> quite a big difference.
>>>>>>>
>>>>>>> Hope this helps.
>>>>>>>
>>>>>>> On Fri, Mar 18, 2016 at 6:19 AM, Alessandro Benedetti <
>>>>>>> abenede...@apache.org
>>>>>>>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> In a relevancy problem I would repeat what my colleagues already
>>>>>>>> pointed
>>>>>>>> out :
>>>>>>>> Data is key. We need to understand first of all our data before we
>>>>>>>> can
>>>>>>>> understand what is relevant and what is not.
>>>>>>>> Once we specify a groundfloor which make sense ( and your basic
>>>>>>>> approach
>>>>>>>> +
>>>>>>>> proper schema configuration as suggested + properly configured
>>>>>>>> request
>>>>>>>> handler , seems a good start to me ) .
>>>>>>>>
>>>>>>>> At this point if you are still not happy with the relevancy (i.e.
>>>>>>>> you
>>>>>>>> are
>>>>>>>> not happy with the different boosts you assigned ) my strongest
>>>>>>>> suggestion
>>>>>>>> at this time is to move to machine learning.
>>>>>>>> You need a good amount of data to feed the learner and make it your
>>>>>>>> Super
>>>>>>>> Business Expert) .
>>>>>>>> I have been recently working with the Learn To Rank Bloomberg Plugin
>>>>>>>> [1]
>>>>>>>> .
>>>>>>>> In  my opinion will be key for all the business that have many
>>>>>>>> features
>>>>>>>> in
>>>>>>>> the game, that can help to evaluate a proper ranking.
>>>>>>>> For that you need to be able to collect and process signals, and you
>>>>>>>> need
>>>>>>>> to carefully tune the features of your interest.
>>>>>>>> But the results could be surprising .
>>>>>>>>
>>>>>>>> [1] https://issues.apache.org/jira/browse/SOLR-8542
>>>>>>>> [2] Learning to Rank in Solr <
>>>>>>>> https://www.youtube.com/watch?v=M7BKwJoh96s>
>>>>>>>>
>>>>>>>> Cheers
>>>>>>>>
>>>>>>>> On Thu, Mar 17, 2016 at 10:15 AM, Robert Brown <
>>>>>>>> r...@intelcompute.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Thanks Scott and John,
>>>>>>>>
>>>>>>>> As luck would have it I've got a PhD graduate coming for an
>>>>>>>>> interview
>>>>>>>>> today, who just happened to do her research thesis on information
>>>>>>>>>
>>>>>>>>> retrieval
>>>>>>>>>
>>>>>>>> with quantum theory and machine learning  :)
>>>>>>>>
>>>>>>>>> John, it sounds like you're describing my system! Shopping products
>>>>>>>>> from
>>>>>>>>> multiple sources.  (De-duplication is going to be fun soon).
>>>>>>>>>
>>>>>>>>> I already copy fields like merchant, brand, category, to string
>>>>>>>>> fields
>>>>>>>>> to
>>>>>>>>> use them as facets/filters.  I was contemplating removing the
>>>>>>>>> description
>>>>>>>>> due to the spammy issue you mentioned, I didn't know about the
>>>>>>>>> RemoveDuplicatesTokenFilterFactory, so I'm sure that's going to be
>>>>>>>>> a
>>>>>>>>> huge
>>>>>>>>> help.
>>>>>>>>>
>>>>>>>>> Thanks a lot,
>>>>>>>>> Rob
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 03/17/2016 10:01 AM, John Smith wrote:
>>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> For once I might be of some help: I've had a similar configuration
>>>>>>>>>> (large set of products from various sources). It's very difficult
>>>>>>>>>> to
>>>>>>>>>> find the right balance between all parameters and requires a lot
>>>>>>>>>> of
>>>>>>>>>> tweaking, most often in the dark unfortunately.
>>>>>>>>>>
>>>>>>>>>> What I've found is that omitNorms=true is a real breakthrough:
>>>>>>>>>> without
>>>>>>>>>> it results tend to favor small texts, which is not what's wanted
>>>>>>>>>> for
>>>>>>>>>> product names. I also added a RemoveDuplicatesTokenFilterFactory
>>>>>>>>>> for
>>>>>>>>>> the
>>>>>>>>>> name as it's a common practice for spammers to repeat some key
>>>>>>>>>> words in
>>>>>>>>>> order to be better placed in results. Stemming and custom stop
>>>>>>>>>> words
>>>>>>>>>> (e.g. "cheap", "sale", ...) are other potential ideas.
>>>>>>>>>>
>>>>>>>>>> I've also ended up in removing the description field as it's often
>>>>>>>>>> too
>>>>>>>>>> broad, and name is now the only field left: brand, category and
>>>>>>>>>> merchant
>>>>>>>>>> (as well as other fields) are offered as additional filters using
>>>>>>>>>> facets. Note that you'd have to re-index them as plain strings.
>>>>>>>>>>
>>>>>>>>>> It's more difficult to achieve but popularity boost can also be
>>>>>>>>>> useful:
>>>>>>>>>> you can measure it by sales or by number of clicks. I use a
>>>>>>>>>> combination
>>>>>>>>>> of both, and store those values using partial updates.
>>>>>>>>>>
>>>>>>>>>> Hope it helps,
>>>>>>>>>> John
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 17/03/16 09:36, Robert Brown wrote:
>>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> I currently have an index of ~50m docs representing shopping
>>>>>>>>>>> products:
>>>>>>>>>>> name, description, brand, category, etc.
>>>>>>>>>>>
>>>>>>>>>>> Our "qf" is currently setup as:
>>>>>>>>>>>
>>>>>>>>>>> name^5
>>>>>>>>>>> brand^2
>>>>>>>>>>> category^3
>>>>>>>>>>> merchant^2
>>>>>>>>>>> description^1
>>>>>>>>>>>
>>>>>>>>>>> mm: 100%
>>>>>>>>>>> ps: 5
>>>>>>>>>>>
>>>>>>>>>>> I'm getting complaints from the business concerning relevancy,
>>>>>>>>>>> and
>>>>>>>>>>> was
>>>>>>>>>>> hoping to get some constructive ideas/thoughts on whether these
>>>>>>>>>>> boosts
>>>>>>>>>>> look semi-sensible or not, I think they were put in place pretty
>>>>>>>>>>> much
>>>>>>>>>>> at random.
>>>>>>>>>>>
>>>>>>>>>>> I know it's going to be a case of rounds upon rounds of testing,
>>>>>>>>>>> but
>>>>>>>>>>> maybe there's a good starting point that will save me some time?
>>>>>>>>>>>
>>>>>>>>>>> My initial thoughts right now are to actually just search on the
>>>>>>>>>>> name
>>>>>>>>>>> field, and maybe the brand (for things like "Apple Ipod").
>>>>>>>>>>>
>>>>>>>>>>> Has anyone got a similar setup that could share some direction?
>>>>>>>>>>>
>>>>>>>>>>> Many Thanks,
>>>>>>>>>>> Rob
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>>
>>>>>>>>>> --------------------------
>>>>>>>>
>>>>>>>> Benedetti Alessandro
>>>>>>>> Visiting card : http://about.me/alessandro_benedetti
>>>>>>>>
>>>>>>>> "Tyger, tyger burning bright
>>>>>>>> In the forests of the night,
>>>>>>>> What immortal hand or eye
>>>>>>>> Could frame thy fearful symmetry?"
>>>>>>>>
>>>>>>>> William Blake - Songs of Experience -1794 England
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>
>


-- 
--------------------------

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

Re: Boosts for relevancy (shopping products)

Reply via email to