Re: Boosts for relevancy (shopping products)

Robert Brown Sun, 20 Mar 2016 04:25:53 -0700

It's also worth mentioning that our platform contains shopping productsin every single category, and will be searched by absolutely anyone, viaan API made available to various websites, some niche, some not.

If those websites are category specific, ie, electrical goods, then wecould boost on certain categories for a given website, but if they'realso broad, is this even possible?

I guess we could track individual users and build up search-histories totry and guide us, but I don't see many hits being made on repeat users.

Recording clicks on products could also be used to boost individualproducts for specific keywords - I'm beginning to think this is actuallyour best hope? e.g. A multi-valued field containing keywords thatresulted in a click on that product.





On 03/18/2016 04:14 PM, Robert Brown wrote:

That does sound rather useful!

We currently have it set to 0.1



On 03/18/2016 04:13 PM, Nick Vasilyev wrote:
Tie does quite a bit, without it only the highest weighted field thathas
the term will be included in relevance score. Tie let's you include the
other fields that match as well.
On Mar 18, 2016 10:40 AM, "Robert Brown" <r...@intelcompute.com> wrote:
Thanks for the added input.
I'll certainly look into the machine learning aspect, will be goodto put
some basic knowledge I have into practice.

I'd been led to believe the tie parameter didn't actually do a lot. :-/



On 03/18/2016 12:07 PM, Nick Vasilyev wrote:
I work with a similar catalog; except our data is especially bad.We've
found that several things helped:

- Item level grouping (group same item sold by multiple vendors). Rank
items with more vendors a bit higher.
- Include a boost function for other attributes, such as anoriginal image
of the product
- Rank items a bit higher if they have data from an externalcatalog like
IceCat
- For relevance and performance, we have several fields that wecopy datainto. High value fields get copied into a high weighted field,while lower
value fields like description get copied into a lower weighted field.
These
fields are the backbone of our qf parameter, with other fields adding
additional boost.
- Play around with the tie parameter for edismax, we found that itmakes
quite a big difference.

Hope this helps.

On Fri, Mar 18, 2016 at 6:19 AM, Alessandro Benedetti <
abenede...@apache.org
wrote:
In a relevancy problem I would repeat what my colleagues alreadypointed
out :
Data is key. We need to understand first of all our data before wecan
understand what is relevant and what is not.
Once we specify a groundfloor which make sense ( and your basicapproach
+
proper schema configuration as suggested + properly configuredrequest
handler , seems a good start to me ) .
At this point if you are still not happy with the relevancy (i.e.you are
not happy with the different boosts you assigned ) my strongest
suggestion
at this time is to move to machine learning.
You need a good amount of data to feed the learner and make ityour Super
Business Expert) .
I have been recently working with the Learn To Rank BloombergPlugin [1]
.
In my opinion will be key for all the business that have manyfeatures
in
the game, that can help to evaluate a proper ranking.
For that you need to be able to collect and process signals, andyou need
to carefully tune the features of your interest.
But the results could be surprising .

[1] https://issues.apache.org/jira/browse/SOLR-8542
[2] Learning to Rank in Solr <
https://www.youtube.com/watch?v=M7BKwJoh96s>

Cheers

On Thu, Mar 17, 2016 at 10:15 AM, Robert Brown <r...@intelcompute.com>
wrote:

Thanks Scott and John,
As luck would have it I've got a PhD graduate coming for aninterview
today, who just happened to do her research thesis on information
retrieval
with quantum theory and machine learning  :)

John, it sounds like you're describing my system! Shopping products
from
multiple sources.  (De-duplication is going to be fun soon).
I already copy fields like merchant, brand, category, to stringfields
to
use them as facets/filters.  I was contemplating removing the
description
due to the spammy issue you mentioned, I didn't know about the
RemoveDuplicatesTokenFilterFactory, so I'm sure that's going to be a
huge
help.

Thanks a lot,
Rob



On 03/17/2016 10:01 AM, John Smith wrote:

Hi,
For once I might be of some help: I've had a similar configuration
(large set of products from various sources). It's verydifficult to
find the right balance between all parameters and requires a lot of
tweaking, most often in the dark unfortunately.
What I've found is that omitNorms=true is a real breakthrough:withoutit results tend to favor small texts, which is not what's wantedforproduct names. I also added a RemoveDuplicatesTokenFilterFactoryfor
the
name as it's a common practice for spammers to repeat some keywords inorder to be better placed in results. Stemming and custom stopwords
(e.g. "cheap", "sale", ...) are other potential ideas.
I've also ended up in removing the description field as it'soften too
broad, and name is now the only field left: brand, category and
merchant
(as well as other fields) are offered as additional filters using
facets. Note that you'd have to re-index them as plain strings.
It's more difficult to achieve but popularity boost can also beuseful:you can measure it by sales or by number of clicks. I use acombination
of both, and store those values using partial updates.

Hope it helps,
John


On 17/03/16 09:36, Robert Brown wrote:

Hi,
I currently have an index of ~50m docs representing shoppingproducts:
name, description, brand, category, etc.

Our "qf" is currently setup as:

name^5
brand^2
category^3
merchant^2
description^1

mm: 100%
ps: 5
I'm getting complaints from the business concerning relevancy,and washoping to get some constructive ideas/thoughts on whether theseboostslook semi-sensible or not, I think they were put in placepretty much
at random.
I know it's going to be a case of rounds upon rounds oftesting, but
maybe there's a good starting point that will save me some time?
My initial thoughts right now are to actually just search onthe name
field, and maybe the brand (for things like "Apple Ipod").

Has anyone got a similar setup that could share some direction?

Many Thanks,
Rob
--
--------------------------

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

Re: Boosts for relevancy (shopping products)

Reply via email to