Analyzing CSV phrase fields

Neal Richter Mon, 24 Nov 2008 23:12:22 -0800

Hey all,

Very basic question.. I want to index fields of comma separated values:


Example document:
id: 1
title: Football Teams
keywords: philadelphia eagles, cleveland browns, new york jets

id: 2
title: Baseball Teams
keywords:"philadelphia phillies", "new york yankees", "cleveland indians"

A query of 'new york' should return the obvious documents, but a quoted
phrase query of "yankees cleveland" should return nothing... meaning that
comma breaks phrases without fail.

I've created a textCSV type in the schema.xml file and used the
PatternTokenizerFactory to split on commas, and from there analysis can
proceed as normal via StopFilterFactory, LowerCaseFilter,
RemoveDuplicatesTokenFilter

<tokenizer class="solr.PatternTokenizerFactory" pattern="\s*,\s*"
group="-1"/>

Has anyone done this before?  Can I somehow use an existing (or combination
of) Analyzer?  It seems as though I need to create a PhraseDelimiterFilter
from the WordDelimiterFilter.. though I am sure there is a way to make an
existing analyzer to break things up the way I want.

Thanks - Neal Richter

Analyzing CSV phrase fields

Reply via email to