You mention "that is one way to do it" is there another i'm not seeing?
On Jan 10, 2012, at 4:34 PM, Ted Dunning <ted.dunn...@gmail.com> wrote: > On Tue, Jan 10, 2012 at 5:32 PM, Tanner Postert > <tanner.post...@gmail.com>wrote: > >> We've had some issues with people searching for a document with the >> search term '200 movies'. The document is actually title 'two hundred >> movies'. >> >> Do we need to add every number to our synonyms dictionary to >> accomplish this? > > > That is one way to deal with this. > > But it depends on a lot of hand engineering of special cases. That is good > to have for the low hanging fruit, but it only takes you so far. You can > also automate the discovery of such cases to a certain degree by analyzing > query logs. > > >> Is it best done at index or search time? >> > > I would say that opinion is divided on this and in the end, you probably > have to do versions of this at both times. This is especially true if you > want to include secondary information like inferred query purpose > (obviously only available at query time) and inferred document > characteristics (best known at indexing time). Partly the choice about > when to do this is driven by which trade-offs you are OK making. For > instance, some people are driven by index size but not query response time. > They would probably opt for pushing load to the query. Others may be > bound by response time or query throughput. They may wish to minimize > query complexity and size.