The easiest solution would be to create the documents you send to solr with multiple keywords fields... they will be separated by a positionIncrement so a phrase query won't see yankees adjacent to cleveland.
If you can't do that, then perhaps patch PatternTokenizer filter to put a larger positionIncrement between groups. Then you would need to follow it by another filter that tokens on whitespace or some other regex (which we currently don't have). -Yonik On Tue, Nov 25, 2008 at 2:10 AM, Neal Richter <[EMAIL PROTECTED]> wrote: > Hey all, > > Very basic question.. I want to index fields of comma separated values: > > Example document: > id: 1 > title: Football Teams > keywords: philadelphia eagles, cleveland browns, new york jets > > id: 2 > title: Baseball Teams > keywords:"philadelphia phillies", "new york yankees", "cleveland indians" > > A query of 'new york' should return the obvious documents, but a quoted > phrase query of "yankees cleveland" should return nothing... meaning that > comma breaks phrases without fail. > > I've created a textCSV type in the schema.xml file and used the > PatternTokenizerFactory to split on commas, and from there analysis can > proceed as normal via StopFilterFactory, LowerCaseFilter, > RemoveDuplicatesTokenFilter > > <tokenizer class="solr.PatternTokenizerFactory" pattern="\s*,\s*" > group="-1"/> > > Has anyone done this before? Can I somehow use an existing (or combination > of) Analyzer? It seems as though I need to create a PhraseDelimiterFilter > from the WordDelimiterFilter.. though I am sure there is a way to make an > existing analyzer to break things up the way I want. > > Thanks - Neal Richter >