Shawn, Have you looked at http://www.sematext.com/products/dym-researcher/index.html as a solution to the ZeroHits problem?
If that doesn't work, then yes, offline word/phase co-occurrence may work. Otis ---- Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ >________________________________ >From: Shawn Heisey <s...@elyograg.org> >To: solr-user@lucene.apache.org >Sent: Wednesday, October 5, 2011 4:06 PM >Subject: Offering search suggestions - a discussion of multi-term phrases > >I am trying to figure out how we can begin offering search suggestions to >people, especially when a user types in something that results in few or zero >results. For background, we have an archive of about 60 million objects, most >of which are photographs. There are also a number of text articles, and most >recently, videos. The metadata is kept in a database, and the database is >used as the import source for Solr. > >The first thing we're going to try is spellcheck, using the terms component to >generate a wordlist from our catchall field and then doing what we can in with >a program to remove undesirable words. I do not anticipate running into much >trouble with this part. > >Another idea we have is search suggestions. One aspect is autocomplete, the >other is similar to the spell-check, but more sophisticated. It would do >things like offer "Nicole Kidman" if the user typed in "Tom Cruise" and didn't >get many search results. > >The problem I can see with all of these things is that single terms will not >really be enough, and single terms is all I can get out of the index. Our >distributed index is already quite a bit larger than the available RAM on the >machines that contain it, and it's growing steadily. Adding analysis >complexity or copyFields to the index is not much of an option, because we >have no budget available for new hardware, but I won't completely rule it out. > >Is there any way, even if it's offline analysis of either the index or the >database, to come up with common short phrases specific to our data? If there >is, perhaps I can then give it to Solr and let it make suggestions with it. > >Thanks, >Shawn > > > >