Hi Tanner, Here is another simple way: AutoComplete. You know what your users are searching for, you can identify top queries and you can identify common queries that are not finding matches. This all allows you to figure out what to feed in AutoComplete. And hopefully your AutoComplete doesn't just perform a search with selected suggestion (e.g. 200 movies), but "translates" that to either a redirect to the specific associated item or a "translated query".
Another related approach is handling this with DYM or Related Searches type functionality. Didn't check out Ted's link yet, but it sounds like that may be related. We've had some luck building this from query logs, where we've examined query patterns and figured out that when people query for e.g."200 movies" they really wanted "two hundred movies". Think about Google's query spelling suggestions. Otis ---- Performance Monitoring SaaS for Solr - http://sematext.com/spm/solr-performance-monitoring/index.html ----- Original Message ----- > From: Tanner Postert <tanner.post...@gmail.com> > To: "solr-user@lucene.apache.org" <solr-user@lucene.apache.org> > Cc: > Sent: Tuesday, January 10, 2012 10:21 PM > Subject: Re: Stemming numbers > > You mention "that is one way to do it" is there another i'm not > seeing? > > On Jan 10, 2012, at 4:34 PM, Ted Dunning <ted.dunn...@gmail.com> wrote: > >> On Tue, Jan 10, 2012 at 5:32 PM, Tanner Postert > <tanner.post...@gmail.com>wrote: >> >>> We've had some issues with people searching for a document with the >>> search term '200 movies'. The document is actually title > 'two hundred >>> movies'. >>> >>> Do we need to add every number to our synonyms dictionary to >>> accomplish this? >> >> >> That is one way to deal with this. >> >> But it depends on a lot of hand engineering of special cases. That is good >> to have for the low hanging fruit, but it only takes you so far. You can >> also automate the discovery of such cases to a certain degree by analyzing >> query logs. >> >> >>> Is it best done at index or search time? >>> >> >> I would say that opinion is divided on this and in the end, you probably >> have to do versions of this at both times. This is especially true if you >> want to include secondary information like inferred query purpose >> (obviously only available at query time) and inferred document >> characteristics (best known at indexing time). Partly the choice about >> when to do this is driven by which trade-offs you are OK making. For >> instance, some people are driven by index size but not query response time. >> They would probably opt for pushing load to the query. Others may be >> bound by response time or query throughput. They may wish to minimize >> query complexity and size. >