Hey all, Very basic question.. I want to index fields of comma separated values:
Example document: id: 1 title: Football Teams keywords: philadelphia eagles, cleveland browns, new york jets id: 2 title: Baseball Teams keywords:"philadelphia phillies", "new york yankees", "cleveland indians" A query of 'new york' should return the obvious documents, but a quoted phrase query of "yankees cleveland" should return nothing... meaning that comma breaks phrases without fail. I've created a textCSV type in the schema.xml file and used the PatternTokenizerFactory to split on commas, and from there analysis can proceed as normal via StopFilterFactory, LowerCaseFilter, RemoveDuplicatesTokenFilter <tokenizer class="solr.PatternTokenizerFactory" pattern="\s*,\s*" group="-1"/> Has anyone done this before? Can I somehow use an existing (or combination of) Analyzer? It seems as though I need to create a PhraseDelimiterFilter from the WordDelimiterFilter.. though I am sure there is a way to make an existing analyzer to break things up the way I want. Thanks - Neal Richter