Re: Clarification on WordDelimiterFilter.

Modassar Ather Thu, 13 Aug 2015 05:39:03 -0700

Thanks for your response Cario.

On Wed, Aug 12, 2015 at 10:20 PM, Cario, Elaine <
elaine.ca...@wolterskluwer.com> wrote:


> Modassar,
>
> There are additional settings in WDFF that you can experiment with (google
> around for the javadocs for the filter).  Specific to your question, there
> is splitOnNumerics param, which might be defaulting to true ("1") causing
> terms like "3d" to get tokenized as "3" and "d".  If you set it to 0 it may
> correct the behavior you're seeing. (You'll need to re-index your content
> to see the effect).
>
> Also, the standard practice that I've seen is that settings which create
> additional tokens are usually only applied at index time, and not applied
> during query time analysis (on the theory that you've indexed all the
> different ways the user can search for a term, so there's no need to
> actually modify the query to get a match).
>
> -----Original Message-----
> From: Modassar Ather [mailto:modather1...@gmail.com]
> Sent: Friday, August 07, 2015 12:21 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Clarification on WordDelimiterFilter.
>
> Hi,
>
> Any suggestion will be really helpful. Kindly provide your inputs.
>
> Thanks,
> Modassar
>
> On Thu, Aug 6, 2015 at 2:06 PM, Modassar Ather <modather1...@gmail.com>
> wrote:
>
> > I am using WordDelimiterFilter while indexing and searching both with
> > the following attributes. Parser used is edismax. Solr version is 5.2.1.
> >
> > *<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
> > generateNumberParts="1" catenateWords="1" catenateNumbers="1"
> > catenateAll="1" splitOnCaseChange="1" preserveOriginal="1"/>*
> >
> > During search some of the results returned are not wanted. Following
> > is the example.
> >
> > Search query: "3d image"
> > Search results with 3-d image/3 d image/1d image are also returned. As
> > per analysis page this is happening because of position increment in
> > the token as explained below.
> >
> > On the analysis page it shows following four tokens for 3d and there
> > positions.
> > token         position
> > 3d             1
> > 3               1
> > 3d             1
> > d               2
> >
> > image        3
> >
> > Another example is "1d obj*" returning results containing "d-object"
> > related result. This can bring a completely different search item.
> >
> > Here the token d is at position 2 which is causing the above matches.
> > Please help me understand why this position increment is done?
> > The position increment will also cause the "3d image" search fail on a
> > document containing "3d image" as the "d" comes at position 2.
> >
> > Kindly help me understand the best practices of using
> > WordDelimiterFilter or provide your inputs how we can resolve the issue.
> >
> > Thanks,
> > Modassar
> >
>

Re: Clarification on WordDelimiterFilter.

Reply via email to