to keep the StandardTokenizer (because we make use of the
> > token
> > types) but wanted to use the WDFF to get combinations of words that are
> > split with certain characters (mainly - and /, but possibly others as
> > well),
> > what is the suggested way of accomplishing this? Would we just have to
> > extend the JFlex file for the tokenizer and re-compile it?
> >
> >
> >
> > --
> > View this message in context:
> > http://lucene.472066.n3.nabble.com/WordDelimiterFilterFactory-and-StandardTokenizer-tp4131628p4136146.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
> >
>
lit with certain characters (mainly - and /, but possibly others as well),
> what is the suggested way of accomplishing this? Would we just have to
> extend the JFlex file for the tokenizer and re-compile it?
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/WordDelimiterFilterFactory-and-StandardTokenizer-tp4131628p4136146.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>
because we make use of the token
> types) but wanted to use the WDFF to get combinations of words that are
> split with certain characters (mainly - and /, but possibly others as well),
> what is the suggested way of accomplishing this? Would we just have to
> extend the JFlex file for th
On 5/16/2014 9:24 AM, aiguofer wrote:
> Jack Krupansky-2 wrote
>> Typically the white space tokenizer is the best choice when the word
>> delimiter filter will be used.
>>
>> -- Jack Krupansky
>
> If we wanted to keep the StandardTokenizer (because we make use of the token
> types) but wanted to
we just have to
extend the JFlex file for the tokenizer and re-compile it?
--
View this message in context:
http://lucene.472066.n3.nabble.com/WordDelimiterFilterFactory-and-StandardTokenizer-tp4131628p4136146.html
Sent from the Solr - User mailing list archive at Nabble.com.
3.nabble.com/WordDelimiterFilterFactory-and-StandardTokenizer-tp4131628p4136146.html
Sent from the Solr - User mailing list archive at Nabble.com.
Typically the white space tokenizer is the best choice when the word
delimiter filter will be used.
-- Jack Krupansky
-Original Message-
From: Shawn Heisey
Sent: Wednesday, April 16, 2014 11:03 PM
To: solr-user@lucene.apache.org
Subject: Re: WordDelimiterFilterFactory and
On 4/16/2014 8:37 PM, Bob Laferriere wrote:
>> I am seeing odd behavior from WordDelimiterFilterFactory (WDFF) when
>> used in conjunction with StandardTokenizerFactory (STF).
>> I see the following results for the document of “wi-fi”:
>>
>> Index: “wi”, “fi”
>> Query: “wi”,”fi”,”wifi”
>>
>
I am seeing odd behavior from WordDelimiterFilterFactory (WDFF) when used in conjunction with StandardTokenizerFactory (STF). If I use the following configuration: