My issue is more that the search term doll shows up in both documents on CDs
as well as documents about toys. But I have 10 CD documents for every toy
document, so my searches for "doll" tend to show the CDs most prominently.
But that's not the way a user thinks. If they want the CD documents they'll
search for "doll face", or "doll face song", more specific queries (which
work fine), but if they want the toy they might just search for "doll".

If I run the searches "doll" and "doll song" on google image search you'll
clearly see that google has solved this problem perfectly. "doll" returns
toy dolls, and "doll song" returns music and anime results.

I'm striving for this type of result.



-----Original Message-----
From: Amit Jha [mailto:shanuu....@gmail.com] 
Sent: Wednesday, January 16, 2013 11:41 PM
To: solr-user@lucene.apache.org
Subject: Re: Search strategy - improving search quality for short search
terms such as "doll"

Its all about the data data set, here I mean index. If you have documents
containing "toy" and "doll" it will return that in result set. 

What I understood that you are talking about the context of the query. For
example if you search "books on MK Gandhi" and "books by MK Gandhi" both
queries have different context.

Context based search at some level achieved by natural language processing.
This one you can look at for better search.

Look for solr wiki & mailing list would be great source of learning.


Rgds
AJ

On 16-Jan-2013, at 15:10, "David Parks" <davidpark...@yahoo.com> wrote:

> I'm a beginner-intermediate solr admin, I've set up the basics for our 
> application and it runs well.
> 
> 
> 
> Now it's time for me to dig in and start tuning and improving queries.
> 
> 
> 
> My next target is searches on simple terms such as "doll" which, in 
> google, would return documents about, well, "toy dolls", because 
> that's the most common usage of the simple term "doll". But in my 
> index it predominantly returns documents about CDs with the song "Doll 
> Face", and "My baby doll" in them.
> 
> 
> 
> I'm not directly asking how to solve this as much as I'm asking what 
> direction I should be looking in to learn what I need to know to 
> tackle the general issue myself.
> 
> 
> 
> Left on my own I would start looking at categorizing the CD's into a 
> facet called "music", reasonably doable in my dataset. Then I need to 
> reduce the boost-value of the entire facet/category of music unless 
> certain pre-defined query terms exist, such as [music, cd, song, 
> listen, dvd, <analyze actual user queries to come up with a more
exhaustive list>, etc.].
> 
> 
> 
> I don't yet know how to do all of this, but after a couple more good 
> books I should be "dangerous".
> 
> 
> 
> So the question to this list:
> 
> 
> 
> -          Am I on the right track here?  If not, can you point me in a
> direction to go?
> 
> 
> 
> 
> 

Reply via email to