Trouble Configuring WordDelimiterFilterFactory

Rahul R Tue, 24 Nov 2009 09:29:37 -0800

Hello,
In our application we have a catch-all field (the 'text' field) which is
cofigured as the default search field. Now this field will have a
combination of numbers, alphabets, special characters etc. I have a
requirement wherein the WordDelimiterFilterFactory does not work on numbers,
especially those with decimal points. Accuracy of results with relevance to
numerical data is quite important, So if the text field of a document has
data like "Bridge-Diode 3.55 Volts", I want to make sure that a search for
"355" or "35.5" does not retrieve this document. So I found the following
setting for the WordDelimiterFilterFactory to work for me (for most parts):
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="0" catenateWords="1" catenateNumbers="0"
catenateAll="0" splitOnCaseChange="0" splitOnNumerics="0"
preserveOriginal="1"/>


I am using the same setting for both index and query.

Now the only problem is, if I have data like ".355". With the above setting,
the analysis jsp shows me that WordDelimiterFilterFactory is creating term
texts as both ".355' and "355". So a search for ".355" retrieves documents
containing both ".355" and "355". A search for "355" also has the same
effect. I noticed that when the entry for the WordDelimiterFilterFactory was
completely removed (both index and query), then the above problem was
resolved. But this seems too harsh a measure.

Is there a way by which I can prevent the WordDelimiterFilterFactory from
totally acting on numerical data ?

Regards
Rahul

Trouble Configuring WordDelimiterFilterFactory

Reply via email to