I'm pretty new to solr; my apologies if this is a naive question, and my apologies for the verbosity: I'd like to take keywords in my documents, and expand them as synonyms; for example, if the document gets annotated with a keyword of 'sf', I'd like that to expand to 'San Francisco'. (San Francisco,San Fran,SF is a line in my synonyms.txt file).
But I also want to be able to display facets with counts for these keywords; I'd like them to be suitable for display. So, if I define the keywords field as 'text', I use the following pipeline (from my schema.xml): <fieldType name="text" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" /> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="false"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> </fieldType> Faceting on this field, I get return values (when I query specifically for the single document in question): <lst name="Keywords"> <int name="fran">1</int> <int name="francisco">1</int> <int name="san">1</int> <int name="sf">1</int> </lst> I've also done a copyfield to a 'KeywordsString' field, which is defined as "string". i.e. <fieldType name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true"/> Faceting on *that* field (when querying for just this 1 document, which has a keyword of 'sf'), results in: <lst name="KeywordsString"> <int name="sf">1</int> </lst> I guess what I'd like to see is the ability to stamp keywords like 'sf', 'san fran', 'san francisco', and 'mlb' (with a synonyms.txt file entry of mlb => Major League Baseball, and see all the documents that are inscribed with all those synonym variants, come back as: <lst name="KeywordsString"> <int name="San Francisco">1</int> <int name="Major League Baseball">1</int> </lst> But, I don't know how to define a processing pipeline that expands synonyms that doesn't tokenize them, breaking 'San Francisco' into 'san' and 'francisco', and presenting those as separate facets. Thanks for any help, Don