One important footnote: the "keep words/synonym analyzer" approach will
index the desired keywords for efficient search, but the stored value that
would be returned in response to a query request would be the full original
text. If you wish to return only the final list of matched synonyms, you
will need to go the custom update processor or preprocessor route.
-- Jack Krupansky
-----Original Message-----
From: Jack Krupansky
Sent: Saturday, June 23, 2012 4:29 PM
To: solr-user@lucene.apache.org
Subject: Re: Store matching synonyms only
There are a number of ways this can be accomplished, including as a
preprocessor or a custom update processor, but you may be able to get by
with a tokenized field without term vectors combined with a "keep words"
filter and an index-time synonym filter that uses "replace mode".
So, in addition to storing the text in a normal text field, do a copyField
to a separate text field which has omitTermFreqAndPositions=true since this
field only needs to indicate the presence of a keyword and not its position
or frequency. It would have a custome field type which starts its index
analyzer with a "keep words" token filter (solr.KeepWordFilterFactory) with
a word list file which contains all words used in your synonyms. This
eliminates all words that do not match one of your synonym words.
Then add a synonym filter that operates in replace mode - expand=true and
ignoreCase=true, with entries such as:
feline,cat,lion,tiger
See:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory
This would index "The cat sat on the tiger's mat" as simply "feline"
-- Jack Krupansky
-----Original Message-----
From: ben ausden
Sent: Saturday, June 23, 2012 1:21 PM
To: solr-user@lucene.apache.org
Subject: Store matching synonyms only
Hi,
Is it possible to store only the matching synonyms found in a piece of
text?
A use case might be: automatically "tag" documents at index time based on
synonyms.txt, and then retrieve the stored tags at query time.
For example, given the text field:
"The cat sat on the mat"
and a synonyms.txt file containing:
feline,cat,lion,tiger
the resulting tag for this document would be "feline". Multiple synonym
matches would result in multiple tags.
Is this possible with Solr by default, or is the classification/tagging
best done outside Solr before I store the document?
Thanks.