Jack, Lee, thanks so much for your suggestions. On Sat, Jun 23, 2012 at 11:25 PM, Lee Carroll <lee.a.carr...@googlemail.com>wrote:
> If you go down the keep-word route you can return the "tags" to the > front end app using a facet field query. This often fits with many > use-cases for doc tags. > > lee c > > On 23 June 2012 22:37, Jack Krupansky <j...@basetechnology.com> wrote: > > One important footnote: the "keep words/synonym analyzer" approach will > > index the desired keywords for efficient search, but the stored value > that > > would be returned in response to a query request would be the full > original > > text. If you wish to return only the final list of matched synonyms, you > > will need to go the custom update processor or preprocessor route. > > > > -- Jack Krupansky > > > > -----Original Message----- From: Jack Krupansky > > Sent: Saturday, June 23, 2012 4:29 PM > > To: solr-user@lucene.apache.org > > Subject: Re: Store matching synonyms only > > > > > > There are a number of ways this can be accomplished, including as a > > preprocessor or a custom update processor, but you may be able to get by > > with a tokenized field without term vectors combined with a "keep words" > > filter and an index-time synonym filter that uses "replace mode". > > > > So, in addition to storing the text in a normal text field, do a > copyField > > to a separate text field which has omitTermFreqAndPositions=true since > this > > field only needs to indicate the presence of a keyword and not its > position > > or frequency. It would have a custome field type which starts its index > > analyzer with a "keep words" token filter (solr.KeepWordFilterFactory) > with > > a word list file which contains all words used in your synonyms. This > > eliminates all words that do not match one of your synonym words. > > > > Then add a synonym filter that operates in replace mode - expand=true and > > ignoreCase=true, with entries such as: > > > > feline,cat,lion,tiger > > > > See: > > > http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory > > > > This would index "The cat sat on the tiger's mat" as simply "feline" > > > > -- Jack Krupansky > > > > -----Original Message----- From: ben ausden > > Sent: Saturday, June 23, 2012 1:21 PM > > To: solr-user@lucene.apache.org > > Subject: Store matching synonyms only > > > > Hi, > > > > Is it possible to store only the matching synonyms found in a piece of > > text? > > > > A use case might be: automatically "tag" documents at index time based on > > synonyms.txt, and then retrieve the stored tags at query time. > > > > For example, given the text field: > > > > "The cat sat on the mat" > > > > and a synonyms.txt file containing: > > > > feline,cat,lion,tiger > > > > the resulting tag for this document would be "feline". Multiple synonym > > matches would result in multiple tags. > > > > Is this possible with Solr by default, or is the classification/tagging > > best done outside Solr before I store the document? > > > > Thanks. >