Re: Stemming numbers

Otis Gospodnetic Tue, 10 Jan 2012 20:43:02 -0800

Hi Tanner,

Here is another simple way: AutoComplete.
You know what your users are searching for, you can identify top queries and 
you can identify common queries that are not finding matches.  This all allows 
you to figure out what to feed in AutoComplete.  And hopefully your 
AutoComplete doesn't just perform a search with selected suggestion (e.g. 200 
movies), but "translates" that to either a redirect to the specific associated 
item or a "translated query".


Another related approach is handling this with DYM or Related Searches type 
functionality.  Didn't check out Ted's link yet, but it sounds like that may be 
related.  We've had some luck building this from query logs, where we've 
examined query patterns and figured out that when people query for e.g."200 
movies" they really wanted "two hundred movies".  Think about Google's query 
spelling suggestions.

Otis
----
Performance Monitoring SaaS for Solr - 
http://sematext.com/spm/solr-performance-monitoring/index.html



----- Original Message -----
> From: Tanner Postert <tanner.post...@gmail.com>
> To: "solr-user@lucene.apache.org" <solr-user@lucene.apache.org>
> Cc: 
> Sent: Tuesday, January 10, 2012 10:21 PM
> Subject: Re: Stemming numbers
> 
> You mention "that is one way to do it" is there another i'm not 
> seeing?
> 
> On Jan 10, 2012, at 4:34 PM, Ted Dunning <ted.dunn...@gmail.com> wrote:
> 
>>  On Tue, Jan 10, 2012 at 5:32 PM, Tanner Postert 
> <tanner.post...@gmail.com>wrote:
>> 
>>>  We've had some issues with people searching for a document with the
>>>  search term '200 movies'. The document is actually title 
> 'two hundred
>>>  movies'.
>>> 
>>>  Do we need to add every number to our  synonyms dictionary to
>>>  accomplish this?
>> 
>> 
>>  That is one way to deal with this.
>> 
>>  But it depends on a lot of hand engineering of special cases.  That is good
>>  to have for the low hanging fruit, but it only takes you so far.  You can
>>  also automate the discovery of such cases to a certain degree by analyzing
>>  query logs.
>> 
>> 
>>>  Is it best done at index or search time?
>>> 
>> 
>>  I would say that opinion is divided on this and in the end, you probably
>>  have to do versions of this at both times.  This is especially true if you
>>  want to include secondary information like inferred query purpose
>>  (obviously only available at query time) and inferred document
>>  characteristics (best known at indexing time).  Partly the choice about
>>  when to do this is driven by which trade-offs you are OK making.  For
>>  instance, some people are driven by index size but not query response time.
>>  They would probably opt for pushing load to the query.  Others may be
>>  bound by response time or query throughput.  They may wish to minimize
>>  query complexity and size.
>

Re: Stemming numbers

Reply via email to