Hi, I would take a different approach. Track users' queries and their clicks. Aggregate queries and start thinking of them as tags/labels. Aggregate them and use top N to tag your docs. Alternatively/additionally, extract significant terms and phrases from clicked-to docs and use that to tag your docs.
Otis -- Search Analytics - http://sematext.com/search-analytics/index.html Performance Monitoring - http://sematext.com/spm/index.html On Tue, May 14, 2013 at 7:04 AM, David Parks <davidpark...@yahoo.com> wrote: > We have a number of queries that produce good results based on the textual > data, but are contextually wrong (for example, an "SSD hard drive" search > matches the music album "SSD hip hop drives us crazy". > > > > Textually a fair match, but SSD is a term that strongly relates to technical > documents. > > > > We'd like to be able to direct this query more strictly in the direction of > the technical documents based on the term "SSD". I am considering whether > it would be worth trying to cluster all documents, thus tending to group the > music with the music and tech items with the tech items. Then pulling out > the term vectors that define each group; do a human review of that data; and > plug it back into the documents of each cluster as a separate search field > that gets boosted. > > > > In my head it seems like a plausible way to weigh terms like SSD to the > cluster of items that it most closely associates. > > > > Should I spend the effort to find out? > > Yeh or neh? >