Clarification on WordDelimiterFilter.

Modassar Ather Thu, 06 Aug 2015 01:37:38 -0700

I am using WordDelimiterFilter while indexing and searching both with the
following attributes. Parser used is edismax. Solr version is 5.2.1.


*<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="1" catenateNumbers="1"
catenateAll="1" splitOnCaseChange="1" preserveOriginal="1"/>*

During search some of the results returned are not wanted. Following is the
example.

Search query: "3d image"
Search results with 3-d image/3 d image/1d image are also returned. As per
analysis page this is happening because of position increment in the token
as explained below.

On the analysis page it shows following four tokens for 3d and there
positions.
token         position
3d             1
3               1
3d             1
d               2

image        3

Another example is "1d obj*" returning results containing "d-object"
related result. This can bring a completely different search item.

Here the token d is at position 2 which is causing the above matches.
Please help me understand why this position increment is done?
The position increment will also cause the "3d image" search fail on a
document containing "3d image" as the "d" comes at position 2.

Kindly help me understand the best practices of using WordDelimiterFilter
or provide your inputs how we can resolve the issue.

Thanks,
Modassar

Clarification on WordDelimiterFilter.

Reply via email to